I'm writing a high-performance parser for a comma delimited stream (network). My goal is to parse and convert directly from binary to dotnet primitives. Based on my testing thus far, Span performance is incredible, but the type is difficult to work with due to restrictions inherent to ref structs. I've hit a roadblock trying to find an efficient way to store Span constants (comma, newline, etc.) used throughout my application. The only solution that seems to exist to store them as byte and convert them in the class bodies of methods...or hardcode Span<byte> delimiter = Encoding.UTF8.GetBytes("\r\n") in every method body.
The following is what I'd like to achieve but it gives the error - `CS8345 Field or auto-implemented property cannot be of type 'Span' unless it is an instance member of a ref struct.
public class Parser
{
Span<byte> NewLine = new byte[]{ (byte)'\n' };
}
There's got to be a better way! Please help!
You're running into issues because ref structs, like Span<T>, are special data types with restrictions that ensure they cannot escape the stack. Classes are reference types that live on the heap. As such, if Span were a member of a class it would break the "stack only" rule. However, static properties that are with implemented (not auto implements e.g. { get; set;} are allowed and seem to be the solution you're look for. See the following example...
public class Parser
{
static Span<byte> NewLine => { (byte)'\n' };
static ReadOnlySpan<byte> Comma => { (byte)',' };
private static Span<byte> TraditionSyntax
{
get
{
return new[] {(byte)'\n' };
}
}
}
Note, the "string"u8.Array() syntax is a literal that converts the string directly into into UTF8 as if you called the encoder the way your referenced in your example. That said, make sure to test the literal to ensure it produces what you expect. It's not always consistent with Encoding.UTF8.
Don't get thrown off by method body (property getter).
There's a compiler optimization that avoids an allocation when you use (byte)
You can create ReadOnlySpan<byte> with UTF-8 literal in .NET 7:
class Consts
{
public static ReadOnlySpan<byte> Delim => "\n"u8;
}
Or use Memory/ReadOnlyMemory:
public class Consts
{
public static ReadOnlyMemory<int> Type { get; } = new []{1};
}
And usage:
ReadOnlySpan<int> span = Consts.Type.Span;
Or decorate aforementioned approach into method/expression bodied property:
class Consts
{
private static readonly byte[] _delim = { (byte)'\n' };
public static ReadOnlySpan<byte> Delim => _delim;
}
Demo
Related
I have a question about creating an immutable struct inside a class definition. I want to define the struct outside the class but use that same struct type in the class definition while maintaining immutability. Will the code below achieve this?
namespace space
{
class Class1
{
public Struct {get; set;}
}
public Struct
{
public Struct(string strVar)
{
StructVar = strVar;
}
public string StructVar {get;}
}
}
Also, if I have a struct within a struct like:
class Class1
{
public Struct2 {get; set;}
}
public struct Struct2
{
public Struct2(string str, InStruct inStrct)
{
StrVar = str;
InStruct = inStrct;
}
public string StrVar {get;}
public InStruct InStruct {get;}
}
public struct InStruct
{
public InStruct(Array ary)
{
StrArray = ary
}
public Array StrArray {get;}
}
Does this also maintain immutability?
Lastly, if the size of the array in the InStruct is likely to be quite long, should I not use a struct at all and just put the array itself into the class definition instead? Am I just going struct crazy?
My concern is that because I'm doing a {set;} in the class definition that I'm breaking a rule somewhere. I would put the struct in the class definition itself but I didn't like to have to continuously call class constructors over and over to create each struct, that kind of seemed to defeat the purpose of using a struct in the first place.
It's a little difficult to give a complete answer without understanding exactly what you are trying to accomplish, but I'll start with a few important distinctions.
First, in C#, the struct/class distinction isn't about mutability per se. You can have a immutable class, like this one
public class CannotBeMutated
{
private string someVal;
public CannotBeMutated(string someVal)
{
_someVal = someVal
}
public string SomeVal => _someVal;
}
and a mutable struct, like this one
// This is not at all idiomatic C#, please don't use this as an example
public struct MutableStruct
{
private string _someVal;
public MutableStruct(string someVal)
{
_someVal = someVal;
}
public void GetVal()
{
return _someVal
}
public void Mutate(string newVal)
{
_someVal = newVal;
}
}
Using the above struct I can do this
var foo = new MutableStruct("Hello");
foo.mutate("GoodBye");
var bar = foo.GetVal(); // bar == "GoodBye"!
The difference between structs and classes is in variable passing semantics. When an object of a value type (e.g. a struct) is assigned to a variable, passed as a parameter to or returned from a method (including a property getter or setter) a copy of the object is made before it is passed to the new function context. When a object of a reference type is passed as a parameter to or returned from a method, no copy is made, because we only pass a reference to the object's location in memory, rather than a copy of the object.
An additional point on struct 'copying'. Imagine you have a struct with a field that is a reference type, like this
public struct StructWithAReferenceType
{
public List<string> IAmAReferenceType {get; set;}
}
When you pass an instance of this struct into a method, a copy of the reference to the List will be copied, but the underlying data will not. So if you do
public void MessWithYourSruct(StructWithAReferenceType t)
{
t.IAmAReferenceType.Add("HAHA");
}
var s = new StructWithAReferenceType { IAmAReferenceType = new List()};
MessWithYourSruct(s);
s.IAmAReferenceType.Count; // 1!
// or even more unsettling
var s = new StructWithAReferenceType { IAmAReferenceType = new List()};
var t = s; // makes a COPY of s
s.IAmAReferenceType.Add("hi");
t.IAmAReferenceType.Count; // 1!
Even when a struct is copied, its reference type fields still refer to the same objects in memory.
The immutable/mutable and struct/class differences are somewhat similar, insofar as they are both about where and whether you can change the contents of an object in your program, but they are still very distinct.
Now on to your question. In your second example, Class1 is not immutable, as you can mutate the value of Struct2 like this
var foo = new Class1();
foo.Struct2 = new Struct2("a", 1);
foo.Struct2 // returns a copy of Struct2("a", 1);
foo.Struct2 = new Struct2("b", 2);
foo.Struct2 // returns a copy of Struct2("b", 2);
Struct2 is immutable, as there is no way for calling code to change the values of StrVar or InVar once. InStruct is similarly immutable. However, Array is not immutable. So InStruct is an immutable container for a mutable value. Similar to if you had a ImmutableList<List<string>>. While you can guarantee calling code does not change the value of InStruct.StrArray to a different array, you can do nothing about calling code changing the value of the objects in the Array.
Finally, some generic advice related to your example.
First, mutable structs, or structs with mutable fields, are bad. The examples above should point to why structs with mutable fields are bad. And Eric Lippert himself has a great example of how terrible mutable structs can be on his blog here
Second, for most developers working in C# there's almost never a reason to create a user defined value type (i.e. a struct). Objects of value types are stored on the stack, which makes memory access to them very fast. Objects of reference types are stored on the heap, and so are slower to access. But in the huge majority of C# programs, that distinction is going to be dwarfed by the time cost of disk I/O, network I/O, reflection in serialization code, or even initialization and manipulation of collections. For ordinary developers who aren't writing performance-critical standard libraries, there's almost no reason to think about the performance implications of the difference. Heck, developers in Java, Python, Ruby, Javascript and many other languages get by in languages totally without user-defined value types. Generally, the added cognitive overhead they introduce for developers is almost never worth any benefit you might see. Also, remember that large structs must be copied whenever they are passed or assigned to a variable, and can actually be a performance problem.
TL;DR you probably shouldn't use structs in your code, and they don't really have anything to do with immutability.
This is probably an incredibly dumb question but: I have a function that takes in a string, and I want to make sure that the string is a constant from a specific class. Essentially the effect I'm looking for is what enums do:
enum MyEnum {...}
void doStuff(MyEnum constValue) {...}
Except with strings:
static class MyFakeStringEnum {
public const string Value1 = "value1";
public const string Value2 = "value2";
}
// Ideally:
void doStuff(MyFakeStringEnum constValue) {...}
// Reality:
void doStuff(string constValue) {...}
I know this can technically be achieved by doing some thing like
public static class MyFakeStringEnum {
public struct StringEnumValue {
public string Value { get; private set; }
public StringEnumValue(string v) { Value = v; }
}
public static readonly StringEnumValue Value1 = new StringEnumValue("value1");
public static readonly StringEnumValue Value2 = new StringEnumValue("value2");
}
void doStuff(MyFakeStringEnum.StringEnumValue constValue) {...}
But it feels kind of overkill to make an object for just storing one single value.
Is this something doable without the extra code layer and overhead?
Edit: While a enum can indeed be used for a string, I'd like to avoid it for several reasons:
The string values may not always be a 1:1 translation from the enum. If I have a space in there, different capitalization, a different character set/language, etc. I'd have to transform the enum in every function where I want to use it. It might not be a lot of overhead or a performance hit in any way, but it still should be avoided--especially when it means that I'm always mutating something that should be constant.
Even if I use a separate string array map to solve the above function, I would still have to access the translations instead of just being able to use the enum directly. A map would also mean having two sources for the same data.
I'm interested in this concept for different data types, ex. floats, ulongs, etc. that cannot be easily represented by enum names or stored as an enum value.
As for string -> enum, the point of using an enum in the first place for me is that I can rely on intellisense to give me a constant that exists; I don't want to wait until compile time or runtime to find out. Passing in an actual string would be duck typing and that's something I definitely don't want to do in a strongly typed language.
I would suggest you create an enum and parse the string value into an enum member.
You can use the Enum.Parse method to do that. It throws ArgumentException if the provided value is not a valid member.
using System;
class Program
{
enum MyEnum
{
FirstValue,
SecondValue,
ThirdValue,
FourthValue
}
public static void doStuff(string constValue)
{
var parsedValue = Enum.Parse(typeof(MyEnum), constValue);
Console.WriteLine($"Type: { parsedValue.GetType() }, value: { parsedValue }");
}
static void Main(string[] args)
{
doStuff("FirstValue"); // Runs
doStuff("FirstValuesss"); // Throws ArgumentException
}
}
One of my projects has a value type/struct that represents a custom identifier string for a video format. In this case, it's going to contain a content type string, but that can vary.
I've used a struct so it can be strongly type when it's passed around, and perform some sanity checks on the initial string value.
public struct VideoFormat {
private string contentType;
public VideoFormat(string contentType) {
this.contentType = contentType;
}
public string ContentType {
get { return this.contentType; }
}
public override string ToString() {
return this.contentType;
}
// various static methods for implicit conversion to/from strings, and comparisons
}
As there are a few very common formats, I've exposed these as static read only fields with default values.
public static readonly VideoFormat Unknown = new VideoFormat(string.Empty);
public static readonly VideoFormat JPEG = new VideoFormat("image/jpeg");
public static readonly VideoFormat H264 = new VideoFormat("video/h264");
Is it better to expose the common values as static read only fields or as get only properties? what if I want to change them later? I see both methods used throughout the .Net framework, e.g. System.Drawing.Color uses static readonly properties while System.String has a static read only field for String.Empty, and System.Int32 has a const for MinValue.
(Mostly copied from this question but with a more specific and not directly related question.)
Properties are a good idea unless you are declaring something that never changes.
With properties you can change the inside implementation without affecting programs consuming your library and handle changes / variations. Consuming programs wont break and wont require to be recompiled.
e.g. (I know this is a bad example but you get the idea..)
public static VideoFormat H264Format
{
get{
// This if statement can be added in the future without breaking other programs.
if(SupportsNewerFormat)
return VideoFormat.H265;
return VideoFormat.H264;
}
}
Also keep in mind that if you decided to change a field to a property in the future, consuming code breaks.
Say I have a struct defined as such
struct Student
{
int age;
int height;
char[] name[12];
}
When I'm reading a binary file, it looks something like
List<Student> students = new List<Student>();
Student someStudent;
int num_students = myFile.readUInt32();
for (int i = 0; i < num_students; i++)
{
// read a student struct
}
How can I write my struct so that I just need to say something along the lines of
someStudent = new Student();
So that it will read the file in the order that the struct is defined, and allow me to get the values as needed with syntax like
someStudent.age;
I could define the Student as a class and have the constructor read data and populate them, but it wouldn't have any methods beyond getters/setters so I thought a struct would be more appropriate.
Or does it not matter whether I use a class or struct? I've seen others write C code using structs to read in blocks of data and figured it was a "good" way to do it.
There is not, AFAIK, a low-level direct-layout struct reader built into .NET. You would want want to look at BinaryReader, reading each field in turn? Basically, ReadInt32() twice, and ReadChars(). Pay particular attention to the encoding of the character data (ASCII? UTF8? UTF-16?) and the endianness of the integers.
Personally, I'd look more at using a dedicated cross-platform serializer!
If you want to serialize / deserialize the struct
If you want to read/write the entire struct to a binary file (serialization), I suggest you look at
https://stackoverflow.com/a/629120/141172
Or, if it is an option for you, follow #Marc's advice and use a cross-platform serializer. Personally I would suggest protobuf-net which just happens to have been written by #Marc.
If you are loading from an arbitrary file format
Just like a class, a struct can have a constructor that accepts multiple parameters. In fact, it is generally wise to not provide setters for a struct. Doing so allows the values of the struct to be changed after it is constructed, which generally leads to programming bugs because many developers fail to appreciate the fact that struct is a value type with value semantics.
I would suggest providing a single constructor to initialize your struct, reading the values from the file into temporary variables, and then constructing the struct with a constructor.
public stuct MyStruct
{
public int Age { get; private set; }
public int Height { get; private set; }
private char[] name;
public char[] Name
{
get { return name; }
set
{
if (value.Length > 12) throw new Exception("Max length is 12");
name = value;
}
}
public MyStruct(int age, int height, char[] name)
{
}
}
To dig further into the perils of mutable structs (ones that can be changed after initialized) I suggest
Why are mutable structs “evil”?
Look, I know static classes can't inherit or implement. The question is "what the heck is the right C# + OOP pattern to implement this?". "This" is described below:
I want to define a common set of both definition and implementation for a group of classes where all but one type should be static. Namely, I want to make some arbitrary base converters where each have exactly the same four members:
// Theoritical; static classes can't actually implement
interface IBaseConverter {
int Base { get; }
char[] Glyphs { get; }
int ToInt(string value);
string FromInt(int value);
}
// AND / OR (interface may be superfluous)
public class BaseConverter : IBaseConverter{
public BaseConverter(int Base, char[] Glyphs) {
this.Base = Base;
this.Glyphs = Glyphs;
}
public int Base { get; private set; }
public char[] Glyphs { get; private set;}
public int ToInt(string value) { // shared logic...
public string FromInt(int value) { // shared logic...
}
They can also share the exact same implementation logic based on the value of Base and the ordered collection of glyphs. For example a Base16Converter would have Base = 16 and glyphs = { '0', '1', ... 'E', 'F' }. I trust the FromInt and ToInt are self-explanatory. Obviously I wouldn't need to implement a converter for base 16, but I do need to implement one for an industry-specific base 36 (the 0 - Z glyphs of Code 39). As with the built-in conversion and string formatting functions such as [Convert]::ToInt32("123",16) these are emphatically static methods -- when the base and glyphs are pre-determined.
I want to keep an instance version that can be initialized with arbitrary glyphs and base, such as:
BaseConverter converter = new BaseConverter(7, new[]{ 'P', '!', 'U', '~', 'á', '9', ',' })
int anumber = converter.ToInt("~~!,U") // Equals 8325
But I also want a static class for the Base36Code39Converter. Another way of putting this is that any static implementers just have hard-coded base and glyphs:
// Theoritical; static classes can't inherit
public static class Base36Code39Converter : BaseConverter {
private static char[] _glyphs = { '0', '1', ... 'Z' };
static Base36Code39Converter : base(36, _glyphs) { }
}
I can see why this wouldn't work for the compiler -- there is no vtable for static methods and all that. I understand that in C# a static class cannot implement interfaces or inherit from anything (other than object) (see Why Doesn't C# Allow Static Methods to Implement an Interface?, Why can't I inherit static classes?).
So what the heck is the "right" C# + OOP pattern to implement this?
The direction you're going in... Not so much a good idea.
I would suggest you emulate the pattern presented by the System.Text.Encoding.
It has public static properties of type Encoding which are standard implementations of the Encoding class for different types of text encoding.
ASCII
Gets an encoding for the ASCII (7-bit) character set.
BigEndianUnicode
Gets an encoding for the UTF-16 format that uses the big endian byte order.
Default
Gets an encoding for the operating system's current ANSI code page.
Unicode
Gets an encoding for the UTF-16 format using the little endian byte order.
UTF32
Gets an encoding for the UTF-32 format using the little endian byte order.
UTF7
Gets an encoding for the UTF-7 format.
UTF8
Gets an encoding for the UTF-8 format.
In your case, you would provide an abstract base class, rather than an interface, and expose your common implementations as static properties.
Developers would then have easy access to implementations for the common converters you provide, or they would be able to implement their own.
You could always use composition. In this case, your static class would have an instance of the appropriate converter and just proxy any calls to that:
public static class Base36Code39Converter
{
private static BaseConverter _conv =
new BaseConverter(36, new[]{ '0', '1', ... 'Z' });
public static int ToInt(string val)
{
return _conv.ToInt(val);
}
}
Why do you want to make it static?
Singleton seems to be what you're looking for.
The answer was the Singleton pattern. See for example Implementing Singleton in C#.
Luiggi Mendoza provided this answer, which I marked as an answer, but then he deleted it for some reason. I'm reposting it for completeness.