I am editing a serialized string becuase when deserilized it gives a parse error.
so from a long Serialized string I want to edit "myVar\": \"0.2 mm" with "myVar\": \"0.2"
if I use the follwoing code it works
string NewString = Serializedstring.Replace($"myVar\": \"0.2 mm", $"myVar\": \"0.2")
but my 0.2 is a varible that may change with every occurance. so all I want is to remove mm from the string "myVar\": \"0.2 mm"
If your JSON is coming in with two different formats, rather than trying to hack the JSON string into something usable, it is much safer to use a custom JsonConverter. For example:
public class MillimetreJsonConverter : JsonConverter<double>
{
public override double Read(ref Utf8JsonReader reader, Type typeToConvert,
JsonSerializerOptions options)
{
// First try to get a double, if it works then simply return it
if(reader.TryGetDouble(out var val))
{
return val;
}
// Otherwise we get the string value e.g. "0.2 mm" and
// do some simple string manipulation on it
var value = reader.GetString()!;
value = value.Replace(" mm", "");
if(double.TryParse(value, out var result))
{
return result;
}
// If we get here, perhaps we should throw an exception?
return 0;
}
public override void Write(Utf8JsonWriter writer, double value,
JsonSerializerOptions options)
{
// You can fill this in if you need it
throw new NotImplementedException();
}
}
Now you can modify the class to deserialise into, assuming your JSON looks like this {"myVar": "0.2 mm"},:
public class Foo
{
[JsonConverter(typeof(MillimetreJsonConverter))]
public double myVar { get; set; }
}
Finally, it's simple to deserialise:
var foo = JsonSerializer.Deserialize<Foo>(json);
I can recommend 2 alternate approaches
1)match what you can.
var str = NewString.Replace(" mm,\"", "");
if your string is not going to have a space followed by mm" anywhere else that should be fine.
2)A safer option would be to deserialize it into something that can handle it (i assume your deserializing that property to a number currently) and then string replace the property and map it to what you need it.
There are lots of ways to do this, RegEx, String tokenisation, char array searching
RegEx is probably the closest to what you are currently doing
the Regular expression syntax is described here Basics of RegEx
the pattern [0-9.]+ should match an decimal number but be warned it will also match anything that inclues numbers and dots such as ip addresses
so if you have a regex of
Regex rx = new Regex(#"([0-9.]+) mm")
<your string> = rx.Replace(<Your string>, #"$1");
the details are :
[0-9.] any number or dot
+ one or more of what ever preceded it
() a group that is of special interest
$1 the number of the group that you want to replace the match with
note $0 is the entire input string
this will replace any string that is (number) mm with just the value of the number
Related
I would like .net core System.Text.Json to ignore the single quote character when escaping characters for serialization but I just can't get it to work:
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');
var options = new JsonSerializerOptions{
Encoder = JavaScriptEncoder.Create(encoderSettings)
};
System.Text.Json.JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);
This will result in a string:
{"text":"abc \u0027zorro\u0027 123"}
When I would like it to be
{"text":"abc 'zorro' 123"}
Any ideas here? Just want to not escape the single quotes. I've also tried to replace the \u0027 with a \'.
If I do it like this - it works:
options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping
};
jsonString = JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);
...but this will also disable escaping for all characters including <, > and & (excepting only characters that the JSON standard requires to be escaped), which I also don't want.
This behavior is documented in How to customize character encoding with System.Text.Json:
Block lists
The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.
Global block list
The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE. ... <snip>
Encoder-specific block lists
Examples of encoder-specific blocked code points include '<' and '&' for the HTML encoder, '\' for the JSON encoder, and '%' for the URL encoder. ... <snip>
So, as documented, JavaScriptEncoder.Create() may override your allowed characters and escape certain "blocked" characters. While the full set of blocked characters is not documented, from the reference source, JavaScriptEncoder.Create(TextEncoderSettings settings) constructs an encoder that blocks "HTML sensitive" characters, which are defined in AllowedBmpCodePointsBitmap.cs and include ':
public void ForbidHtmlCharacters()
{
ForbidChar('<');
ForbidChar('>');
ForbidChar('&');
ForbidChar('\''); // can be used to escape attributes
ForbidChar('\"'); // can be used to escape attributes
ForbidChar('+'); // technically not HTML-specific, but can be used to perform UTF7-based attacks
}
If you do not want to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping but also don't want to have ' escaped, could create a custom JsonConverter<string> that manually pieces together the required encoded JSON string, then writes it out using Utf8JsonWriter.WriteRawValue() (which was first introduced in .NET 6):
public class StringConverter : JsonConverter<string>
{
readonly static Lazy<JavaScriptEncoder> Encoder = new (() =>
{
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');
return JavaScriptEncoder.Create(encoderSettings);
});
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => reader.GetString();
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
{
var encoder = Encoder.Value;
using var textWriter = new StringWriter();
textWriter.Write("\"");
foreach (var (startIndex, characterCount, final) in value.SplitIndices('\''))
{
encoder.Encode(textWriter, value, startIndex, characterCount);
if (!final)
textWriter.Write('\'');
}
textWriter.Write("\"");
writer.WriteRawValue(textWriter.ToString(), true);
}
}
public static class StringExtensions
{
public static IEnumerable<(int startIndex, int characterCount, bool final)> SplitIndices(this string value, char separator)
{
if (value == null)
throw new ArgumentNullException(nameof(value));
int index = 0;
int nextIndex;
while ((nextIndex = value.IndexOf(separator, index)) >= 0)
{
yield return (index, nextIndex - index, false);
index = nextIndex + 1;
}
yield return (index, value.Length - index, true);
}
}
Then serialize as follows:
var model = new { text = "abc 'zorro' 123" };
var options = new JsonSerializerOptions
{
Converters = { new StringConverter() },
};
var json = JsonSerializer.Serialize(model, options);
Which results in {"text":"abc 'zorro' 123"} as required. Demo fiddle here.
You could also try to create your own JavaScriptEncoder subclass that ignores global block lists, though that would likely be more involved that creating the custom converter.
I am confused by all the different escaping mechanisms for strings in C#. What I want is an escaping/unescaping method that:
1) Can be used on any string
2) escape+unescape is guaranteed to return the initial string
3) Replaces all punctuation with something else. If that is too much to ask, then at least commas, braces, and #. I am fine with spaces not being escaped.
4) Is unlikely to ever change.
Does it exist?
EDIT: This is for purposes of seriliazing and deserializing app-generated attributes. So my object may or may not have values for Attribute1, Attribute2, Attribute3, etc. Simplifying a bit, the idea is to do something like the below. Goal is to have the encoded collection be brief and more-or-less human-readable.
I am asking what methods would make sense to use for Escape and Unescape.
public abstract class GenericAttribute {
const string key1 = "KEY1"; //It is fine to put some restrictions on the keys, i.e. no punctuation
const string key2 = "KEY2";
public abstract string Encode(); // NO RESTRICTIONS ON WHAT ENCODE MIGHT RETURN
public static GenericAttribute FromKeyValuePair (string key, string value) {
switch (key) {
case key1: return new ConcreteAttribute1(value);
case key2: return new ConcreteAttribute2(value);
// etc.
}
}
}
public class AttributeCollection {
Dictionary <string, GenericAttribute> Content {get;set;}
public string Encode() {
string r = "";
bool first = true;
foreach (KeyValuePair<string, GenericAttribute> pair in this.Content) {
if (first) {
first = false;
} else {
r+=",";
}
r+=(pair.Key + "=" + Escape(pair.Value.Encode()));
}
return r;
}
public AttributeCollection(string encodedCollection) {
// input string is the return value of the Encode method
this.Content = new Dictionary<string, GenericAttribute>();
string[] array = encodedCollection.Split(',');
foreach(string component in array) {
int equalsIndex = component.IndexOf('=');
string key = component.Substring(0, equalsIndex);
string value = component.Substring(equalsIndex+1);
GenericAttribute attribute = GenericAttribute.FromKeyValuePair(key, Unescape(value));
this.Content[key]=attribute;
}
}
}
I'm not entirely sure what your asking, but I believe your intent is for the escaped character to be included, even with the escape.
var content = #"\'Hello";
Console.WriteLine(content);
// Output:
\'Hello
By utilizing the # it will include said escaping, making it apart of your string. That is for the server-side with C#, to account for other languages and escape formats only you would know that.
You can find some great information on C# escaping here:
MSDN Blog
Try using HttpServerUtility.UrlEncode and HttpServerUtility.UrlDecode. I think that will encode and decode all the things you want.
See the MSDN Docs and here is a description of the mapping on Wikipedia.
i have a following type of string format ---
Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}
the string contains pair of { } with different positions and may be n number of times.
now i want to replace that pair with other strings which i will compute depending on the string between { } pair.
how to do this ?
You could try regular expressions. Specifically, Regex.Replace variants using MatchEvaluator should do the trick. See http://msdn.microsoft.com/en-US/library/cft8645c(v=vs.80).aspx for more information.
Something along these lines:
using System;
using System.Text.RegularExpressions;
public class Replacer
{
public string Replace(string input)
{
// The regular expression passed as the second argument to the Replace method
// matches strings in the format "{value0#value1/value2}", i.e. three strings
// separated by "#" and "/" all surrounded by braces.
var result = Regex.Replace(
input,
#"{(?<value0>[^#]+)#(?<value1>[^/]+)/(?<value2>[^}]+)}",
ReplaceMatchEvaluator);
return result;
}
private string ReplaceMatchEvaluator(Match m)
{
// m.Value contains the matched string including the braces.
// This method is invoked once per matching portion of the input string.
// We can then extract each of the named groups in order to access the
// substrings of each matching portion as follows:
var value0 = m.Groups["value0"].Value; // Contains first value, e.g. "Jwala Vora"
var value1 = m.Groups["value1"].Value; // Contains second value, e.g. "3"
var value2 = m.Groups["value2"].Value; // Contains third value, e.g. "13"
// Here we can do things like convert value1 and value2 to integers...
var intValue1 = Int32.Parse(value1);
var intValue2 = Int32.Parse(value2);
// etc.
// Here we return the value with which the matching portion is replaced.
// This would be some function of value0, value1 and value2 as well as
// any other data in the Replacer class.
return "xyz";
}
}
public static class Program
{
public static void Main(string[] args)
{
var replacer = new Replacer();
var result = replacer.Replace("Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}");
Console.WriteLine(result);
}
}
This program will output Proposal is given to xyz for xyz xyz by xyz.
You'll need to provide your app-specific logic in the ReplaceMatchEvaluator method to process value0, value1 and value2 as appropriate. The class Replacer can contain additional members that can be used to implement the replacement logic in ReplaceMatchEvaluator. Strings are processed by calling Replace on an instance of the Replacer class.
Well you can split the string by '{' and '}' and determine the contents that way.
But i think a better way would be to find the chars by index and then you know the starting index and the end index of a pair or curly brackets so that way you can reconstruct the string with the placeholders replaced.
But the best method may be using Regex.Replace but that will only help to replace the placeholders with values you want but i think your requirement is to also parse the text inside of the curly brackets and based on that chose the value to be inserted so this won't work well perhaps. Find and Replace a section of a string with wildcard type search
You may use the Regex.Replace Method (String, String, MatchEvaluator) method and the {.*?} pattern. The following example uses a dictionary to replace the values, but you may replace this with your own logic.
class Program
{
static Dictionary<string, string> _dict = new Dictionary<string, string>();
static void Main(string[] args)
{
_dict.Add("{Jwala Vora#3/13}","someValue1");
_dict.Add("{Amazon Vally#2/11}", "someValue2");
_dict.Add("{1#3/75}", "someValue3");
_dict.Add("{MdOffice employee#1/1}", "someValue4");
var input = #"Proposal is given to {Jwala Vora#3/13} for {Amazon Vally#2/11} {1#3/75} by {MdOffice employee#1/1}";
var result = Regex.Replace(input, #"{.*?}", Evaluate);
Console.WriteLine(result);
}
private static string Evaluate(Match match)
{
return _dict[match.Value];
}
}
Cannot you do something with string.Format()?
For example
string.Format("Proposal is given to {0} for {1} {2} by {3}", "Jwala Vora", "Amazon Vally", 1, "MdOffice employee");
In C#, I have a width I want to use for some strings, but I won't know that width until runtime. I'm doing something like this:
string.Format("{0, " + digits + "}", value) // prints 123 as " 123"
Is there a string formatting directive that lets me specify this without smashing my own format string together like this?
I looked around on MSDN for a little while and I feel like I'm missing a whole chapter on format strings or something.
Take a look at PadLeft:
s = "123".PadLeft(5); // Defaults to spaces
s = "123".PadLeft(5, '.'); // Pads with dots
You can use the PadLeft and PadRight methods:
http://msdn.microsoft.com/en-us/library/system.string.padleft%28VS.71%29.aspx
you can do something like
string test = valueString.PadLeft(10,' ');
or even sillier
string spaces = String.Concat(Enumerable.Repeat(" ", digits).ToArray());
The functions mentioned by others will work, but this MSDN page has a more general solution to formatting that changes at runtime:
Composite Formatting
They give examples much like yours.
Edit: I thought you were trying to solve the general case of composing a format string at runtime. For example, if there were no built in PadLeft(), you could do this:
int myInt = 123;
int nColumnWidth = 10;
string fmt = string.Format("Price = |{{0,{0}}}|", nColumnWidth);
// now fmt = "Price = |{0,5}|"
string s = string.Format(fmt, myInt);
You can even do all that in one line, but it's ugly:
string s = string.Format(
string.Format("Price = |{{0,{0}}}|", nColumnWidth),
myInt);
Perhaps this will help with your research on formatting:
Formatting Types
Composite Formatting
However, I don't think you're going to do much better than this, as the alignment parameter must be part of the format string and does not seem to be represented by a property.
Probably overkill but just to illustrate a way to encapsulate the format specification and use an overload of String.Format that accepts an IFormatProvider.
class Program
{
public static void Main(string[] args)
{
int digits = 7;
var format = new PaddedNumberFormatInfo(digits);
Console.WriteLine(String.Format(format, "{0}", 123));
}
}
class PaddedNumberFormatInfo : IFormatProvider, ICustomFormatter
{
public PaddedNumberFormatInfo(int digits)
{
this.DigitsCount = digits;
}
public int DigitsCount { get; set; }
// IFormatProvider Members
public object GetFormat(Type formatType)
{
if (formatType == typeof(ICustomFormatter))
return this;
return null;
}
// ICustomFormatter Members
public string Format(string format, object arg, IFormatProvider provider)
{
return String.Format(
String.Concat("{0, ", this.DigitsCount, "}"), arg);
}
}
I posted a CodeProject article that may be what you want.
See: A C# way for indirect width and style formatting.
Basically it is a method, FormatEx, that acts like String.Format, except it allows for indirect alignment and formatString specifiers.
FormatEx("{0,{1}:{2}}", value, width, formatString);
Means format the value of varArgs 0, in a field width specified by varArgs 1, using a formattingString code specified by varArgs 2.
Edit: Internally, it does what many others have suggested in their answers. I've just wrapped the parsing and determination of the final values to use for alignment and formatString. I also added a "center alignment" modifier.
-Jesse
String has a constructor that creates a string with a given character repeated n times.
https://msdn.microsoft.com/en-us/library/xsa4321w(v=vs.110).aspx
// prints 123 as " 123"
string.Format(new string(' ', digits) + "{0}", value)
I am trying to create a generic formatter/parser combination.
Example scenario:
I have a string for string.Format(), e.g. var format = "{0}-{1}"
I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
I am formatting the array using the format string, e.g. var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?
While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}
You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.
IMO, that's the best way to do this.
It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?
Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.
A simple solution might be to
replace all format tokens with (.*)
escape all other special charaters in format
make the regex match non-greedy
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))
After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}