How can I use ParameterExpression to parameterize an expression tree? - c#

I am learning to use expression trees/expressions in C#. I have gradually built up a parser, with which I can take a string in "calculator" syntax (like "2 * 3 + 14 * 4 / 7 - 5 * 5") and build and evaluate an abstract syntax tree (AST). It even calculates the correct answer! :-) The AST consists of Expression nodes: arithmethical binary nodes (Add, Subtract, Multiply, Divide) and unary Constant nodes representing the integer values.
Next step: I want to add parameters to the expression to be parsed, like "2 * 3 + myVar1 * 4 / 7 - 5 * myVar2", and supply the actual values for the parameters at runtime (after the AST has been compiled). I can easily add the ParameterExpressions to the tree - but I cannot find out how to correctly compile my tree and supply the values.
The parser is built using Coco/R and an attributed Bachus-Naur grammar, and looks like this:
using System.Linq.Expressions;
using Ex = System.Linq.Expressions.Expression;
using System;
namespace AtgKalk
{
public class Parser
{
public const int _EOF = 0;
public const int _identifikator = 1;
public const int _tall = 2;
public const int _pluss = 3;
public const int _minus = 4;
public const int _ganger = 5;
public const int _deler = 6;
public const int maxT = 7;
const bool T = true;
const bool x = false;
const int minErrDist = 2;
public Scanner scanner;
public Errors errors;
public Token t; // last recognized token
public Token la; // lookahead token
int errDist = minErrDist;
public Parser(Scanner scanner)
{
this.scanner = scanner;
errors = new Errors();
}
void SynErr(int n)
{
if (errDist >= minErrDist) errors.SynErr(la.line, la.col, n);
errDist = 0;
}
void Get()
{
for (; ; )
{
t = la;
la = scanner.Scan();
if (la.kind <= maxT) { ++errDist; break; }
la = t;
}
}
void Expect(int n)
{
if (la.kind == n) Get(); else { SynErr(n); }
}
void Calculator()
{
Ex n;
CalcExpr(out n);
Console.Write($"AST: {n} = ");
Console.WriteLine(Ex.Lambda<Func<int>>(n).Compile()());
// The above works fine as long as there are no parameter names in the input string
}
void CalcExpr(out Ex n1)
{
Ex n2; Func<Ex, Ex, Ex> f;
Term(out n1);
while (la.kind == 3 || la.kind == 4)
{
AddOp(out f);
Term(out n2);
n1 = f(n1, n2);
}
}
void Term(out Ex n1)
{
n1 = null; Ex n2; Func<Ex, Ex, Ex> f = null;
Fact(out n1);
while (la.kind == 5 || la.kind == 6)
{
MulOp(out f);
Fact(out n2);
n1 = f(n1, n2);
}
}
void AddOp(out Func<Ex, Ex, Ex> f)
{
f = null;
if (la.kind == 3)
{
Get();
f = (l, r) => Ex.Add(l, r);
}
else if (la.kind == 4)
{
Get();
f = (l, r) => Ex.Subtract(l, r);
}
else SynErr(8);
}
void Fact(out Ex n)
{
n = null;
if (la.kind == 2)
{
Number(out n);
}
else if (la.kind == 1)
{
Parameter(out n);
}
else SynErr(9);
}
void MulOp(out Func<Ex, Ex, Ex> f)
{
f = null;
if (la.kind == 5)
{
Get();
f = (l, r) => Ex.Multiply(l, r);
}
else if (la.kind == 6)
{
Get();
f = (l, r) => Ex.Divide(l, r);
}
else SynErr(10);
}
void Number(out Ex n)
{
Expect(2);
n = Ex.Constant(int.Parse(t.val), typeof(int));
}
void Parameter(out Ex n)
{
Expect(1);
n = Ex.Parameter(typeof(int), t.val);
}
public void Parse()
{
la = new Token();
la.val = "";
Get();
Calculator();
Expect(0);
}
static readonly bool[,] set = {
{T,x,x,x, x,x,x,x, x}
};
} // end Parser
public class Errors
{
public int count = 0; // number of errors detected
public System.IO.TextWriter errorStream = Console.Out; // error messages go to this stream
public string errMsgFormat = "-- line {0} col {1}: {2}"; // 0=line, 1=column, 2=text
public virtual void SynErr(int line, int col, int n)
{
string s;
switch (n)
{
case 0: s = "EOF expected"; break;
case 1: s = "identifikator expected"; break;
case 2: s = "tall expected"; break;
case 3: s = "pluss expected"; break;
case 4: s = "minus expected"; break;
case 5: s = "ganger expected"; break;
case 6: s = "deler expected"; break;
case 7: s = "??? expected"; break;
case 8: s = "invalid AddOp"; break;
case 9: s = "invalid Fakt"; break;
case 10: s = "invalid MulOp"; break;
default: s = "error " + n; break;
}
errorStream.WriteLine(errMsgFormat, line, col, s);
count++;
}
} // Errors
public class FatalError : Exception
{
public FatalError(string m) : base(m) { }
}
}
My problem lies in line 63, I think:
Console.WriteLine(Ex.Lambda<Func<int>>(n).Compile()());
Invocation:
Scanner scanner = new Scanner(args[0]); // if args[0] contains the input string :-)
Parser parser = new Parser(scanner);
parser.Parse();

I have now solved my problem. Thanks to kaby76 for valuable tips leading me in the right direction. The example now can handle an arbitrary number of parameters (probably max 16, since this is the maximum number of input arguments for Func<...>)
The solution to the problem war threefold:
Collect the parameters and supply this collection of parameters to the Lambda
Remove the explicit type arguments from the Lambda, letting it infer types
Use DynamicInvoke to execute the resulting Delegate
The problematic statement then looks like this, for an expression with two parameters:
Console.WriteLine(Ex.Lambda(n, para).Compile().DynamicInvoke(3, 4));

Related

C# Console Calculator with all input on a one line

I am trying to make a program that is a small console calculator where the input is inserted on a single line in the console. example the input "88+12*7/2" should be translated into a math operation looking like this => "((88+ 12) * 7)/2" and print answer in the console.
I will be grateful if you help me to complete this code...
I did a part of the project but it only works to do the operator on two numbers
static string Input_User()
{
string Input_User = Console.ReadLine();
return Input_User;
}
static void ShowMessage()
{
Console.WriteLine("Enter your numbers with operation like: 7*7");
}
ShowMessage();
string input_string = Input_User();
int result = PerformCalculation(InputToList(input_string));
Console.WriteLine($"{input_string}={result}");
static string[] InputToList(string input)
{
string number1 = "";
string number2 = "";
string Oprt = "";
string[] Arithmetic = new string[3];
int n = 0;
foreach (char charecter in input)
{
int num;
bool isNumerical = int.TryParse(charecter.ToString(), out num);
n += 1;
if (isNumerical)
{
number1 += num;
}
else
{
Oprt = charecter.ToString();
Arithmetic[0] = number1;
Arithmetic[1] = Oprt;
for (int i = n; i <= input.Length - 1; i++)
{
number2 += input[i];
}
Arithmetic[2] = number2;
}
}
return Arithmetic;
}
static int PerformCalculation(string[] Input)
{
int result = 0;
switch (Input[1])
{
case "+":
result = Int32.Parse(Input[0]) + Int32.Parse(Input[2]);
break;
case "-":
result = Int32.Parse(Input[0]) - Int32.Parse(Input[2]);
break;
case "*":
result = Int32.Parse(Input[0]) * Int32.Parse(Input[2]);
break;
case "/":
result = Int32.Parse(Input[0]) / Int32.Parse(Input[2]);
break;
}
return result;
}
If your input would be for example '5 + 5 + 5' it would not work because your function InputToList would do the following
InputToList("5+5+5")
-> Return Value ["5","+","5+5"]
The function PerformCalculation would now try to parse 5+5 to an Integer, and that's simply not possible.
One Solution would be to use regular expressions to filter and check the input.
Then you could use a binary tree or a linked list in which you insert the numbers and operators.
After this you would be able to iterate over the list/tree and to do multiple operations.

Convert double number to digits and vice versa?

I'm trying to convert a double number into array of digits
Input:
double num
Output:
int[] arrDigit
int dotIdx
bool isMinus
for example:
Input:
double num = -69.69777
Output:
int[] arrDigit = { 7,7,7,9,6,9,6}
int dotIdx = 5
bool isMinus = true
And vice versa:
Input:
array of input digit commands
Output:
double num
for example:
Input:
Insert digit 6
Insert digit 9
Start dot
Insert digit 6
Insert digit 9
Insert digit 7
Insert digit 7
Insert digit 7
Output:
double num=69.69777
The easiest way is using C# string method, I've implemented it:
class DigitToNumTranslator
{
private bool m_isDot;
//Minus is handled as operator, not the job for translator
//Helper
private StringBuilder m_builder = new StringBuilder();
public double NumResult
{
get
{
return double.Parse(m_builder.ToString(), System.Globalization.CultureInfo.InvariantCulture);
}
}
public void Reset()
{
m_builder.Clear();
m_isDot = false;
}
public void StartDot()
{
if (!m_isDot)
{
m_isDot = true;
m_builder.Append('.');
}
}
public void InsertDigit(int digit)
{
m_builder.Append(digit.ToString());
}
}
class NumToDigitTranslator
{
private List<int> m_lstDigit;
private IList<int> m_lstDigitReadOnly;
private int m_dotIdx;
private bool m_isMinus;
public IList<int> LstDigit => m_lstDigitReadOnly;
public int DotIdx => m_dotIdx;
public bool IsMinus => m_isMinus;
public NumToDigitTranslator()
{
m_lstDigit = new List<int>();
m_lstDigitReadOnly = m_lstDigit.AsReadOnly();
}
public void Translate(double num)
{
m_lstDigit.Clear();
m_dotIdx = 0;
m_isMinus = false;
var szNum = num.ToString(System.Globalization.CultureInfo.InvariantCulture);
//Won't work if it's 1E+17
for (var i = 0; i < szNum.Length; ++i)
{
if (char.IsNumber(szNum[i]))
m_lstDigit.Add(int.Parse(szNum[i].ToString()));
else if (szNum[i] == '-')
m_isMinus = true;
else if (szNum[i] == '.')
m_dotIdx = i;
}
//Reverse for display
if (m_dotIdx != 0)
m_dotIdx = szNum.Length - 1 - m_dotIdx;
m_lstDigit.Reverse();
}
}
But the string method is met with the issue "1E+17" (when the number is too long). I don't like the string method very much because it may have unexpected bug (e.g CultureInfo, 1E+17,... ) who knows if there is more case that I don't know - too risky and my application doesn't use string to display number, it combines sprite image to draw the number.
So I'd like to try the math method:
class DigitToNumTranslatorRaw
{
private double m_numResult;
private bool m_isDot;
private int m_dotIdx;
public double NumResult => m_numResult;
public void Reset()
{
m_numResult = 0;
m_dotIdx = 1;
m_isDot = false;
}
public void StartDot()
{
m_isDot = true;
}
public void InsertDigit(int digit)
{
if (m_isDot)
{
m_numResult += digit * Math.Pow(10, -m_dotIdx);
++m_dotIdx;
}
else
{
m_numResult *= 10;
m_numResult += digit;
}
}
}
class NumToDigitTranslatorRaw
{
private List<int> m_lstDigit;
private IList<int> m_lstDigitReadOnly;
private int m_dotIdx;
public IList<int> LstDigit => m_lstDigitReadOnly;
public int DotIdx => m_dotIdx;
public NumToDigitTranslatorRaw()
{
m_lstDigit = new List<int>();
m_lstDigitReadOnly = m_lstDigit.AsReadOnly();
}
public void Translate(double num)
{
m_dotIdx = 0;
m_lstDigit.Clear();
//WIP (work with int, but not with double, thus failed to get the numbers after dot)
var intNum = (int)num;
while (num > 10)
{
m_lstDigit.Add((intNum % 10));
num /= 10;
}
if (m_lstDigit.Count > 0)
m_lstDigit.Reverse();
else
m_lstDigit.Add(0);
}
}
But I meet with 2 problems:
In DigitToNumTranslatorRaw, I don't now if it's better than the string solution. the m_numResult += digit * Math.Pow(10, -m_dotIdx);, num /= 10;,... may cause floating point precision problem and Is Pow the best way for performance?
In NumToDigitTranslatorRaw, I'm still not able to get the number after dot.
I tried to extract the code TryParse of Mircosoft to see how they do it, but it's too complicated I couldn't find where they put the that code.
So my purpose is:
Math method: write DigitToNumTranslatorRaw & NumToDigitTranslatorRaw and make sure it's bug free & floating point accurate & better performance than string method (because I don't deal with CultureInfo.InvariantCulture, 1E+17,...).
If the math method is too hard, I'll just use the string method DigitToNumTranslator & NumToDigitTranslator and deal with each string problem (e.g too long number turn into 1E+17), but the problem is I don't know if I cover all the string problem (e.g the 1E+17 I found out by random testing, the CultureInfo problem I found out by searching on stack overflow), the docs didn't list all the problems I may encounter.
Code usage example:
Digit to number:
private DigitToNumTranslator m_digit = new DigitToNumTranslator();
m_digit.Reset();
var isEnd = false;
//m_lstInputKey is a list of enum E_INPUT_KEY, created earlier by user input
for (; i < m_lstInputKey.Count; ++i)
{
switch (m_lstInputKey[i])
{
case E_INPUT_KEY.NUM_0: m_digit.InsertDigit(0); break;
case E_INPUT_KEY.NUM_1: m_digit.InsertDigit(1); break;
case E_INPUT_KEY.NUM_2: m_digit.InsertDigit(2); break;
case E_INPUT_KEY.NUM_3: m_digit.InsertDigit(3); break;
case E_INPUT_KEY.NUM_4: m_digit.InsertDigit(4); break;
case E_INPUT_KEY.NUM_5: m_digit.InsertDigit(5); break;
case E_INPUT_KEY.NUM_6: m_digit.InsertDigit(6); break;
case E_INPUT_KEY.NUM_7: m_digit.InsertDigit(7); break;
case E_INPUT_KEY.NUM_8: m_digit.InsertDigit(8); break;
case E_INPUT_KEY.NUM_9: m_digit.InsertDigit(9); break;
case E_INPUT_KEY.NUM_DOT: m_digit.StartDot(); break;
default: isEnd = true; break;
}
if (isEnd) break;
}
Console.WriteLine(m_digit.NumResult);
Number to digit:
private NumToDigitTranslator m_numToDigitTranslator = new NumToDigitTranslator();
double dInputNumber = 6969696969696969696996.69696969696969D;
m_numToDigitTranslator.Translate(dInputNumber);
//Draw function is how you draw the information to the screen
DrawListDigit(m_numToDigitTranslator.LstDigit);
DrawMinus(m_numToDigitTranslator.IsMinus);
DrawDot(m_numToDigitTranslator.DotIdx);
Math solution
Code:
#region MATH_WAY
class DigitToNumTranslatorMath
{
private double m_numResult;
private bool m_isDot;
private int m_dotIdx;
public double NumResult => m_numResult;
public void Reset()
{
m_numResult = 0;
m_dotIdx = 1;
m_isDot = false;
}
public void StartDot()
{
m_isDot = true;
}
public void InsertDigit(int digit)
{
if (m_isDot)
{
m_numResult += digit * Math.Pow(10, -m_dotIdx);
++m_dotIdx;
}
else
{
m_numResult *= 10;
m_numResult += digit;
}
}
}
//Bug: (num - Math.Truncate(num))
//==> floating point problem
//==> 1.9D - Math.Truncate(1.9D) = 0.89999999999999991 (Expected: 0.9)
class NumToDigitTranslatorMath
{
private List<int> m_lstDigit;
private IList<int> m_lstDigitReadOnly;
private int m_dotIdx;
private bool m_isMinus;
public IList<int> LstDigit => m_lstDigitReadOnly;
public int DotIdx => m_dotIdx;
public bool IsMinus => m_isMinus;
public NumToDigitTranslatorMath()
{
m_lstDigit = new List<int>();
m_lstDigitReadOnly = m_lstDigit.AsReadOnly();
}
public void Translate(double num)
{
m_dotIdx = 0;
m_lstDigit.Clear();
m_isMinus = num < 0;
int intDigit;
double intNum;//Use double type to prevent casting a too big double for int which causes overflow
//Get the digits on the right of dot
const int NUM_COUNT_AFTER_DOT = 1000000000;//double has Precision 15-16 digits, but I only need 9 digits
//Math.Truncate(-1.9)=>-1; Math.Floor(-1.9)=>-2;
intNum = Math.Truncate((num - Math.Truncate(num)) * NUM_COUNT_AFTER_DOT);//Floating point bug here!!!
//Remove zeros
while (intNum > 0)
{
intDigit = (int)(intNum % 10);
if (intDigit != 0)
break;
else
intNum = Math.Truncate(intNum / 10);
}
while (intNum > 0)
{
intDigit = (int)(intNum % 10);
intNum = Math.Truncate(intNum / 10);
m_lstDigit.Add(intDigit);
++m_dotIdx;
}
//Get the digits on the left of dot
intNum = Math.Truncate(num);
while (intNum > 0)
{
intDigit = (int)(intNum % 10);
intNum = Math.Truncate(intNum / 10);
m_lstDigit.Add(intDigit);
}
if (m_lstDigit.Count == 0)
m_lstDigit.Add(0);
}
}
#endregion
Note: There is the floating point problem, for example: 1.9D - Math.Truncate(1.9D) = 0.89999999999999991 (Expected: 0.9).
I was planning to extract the code from .Net source code to implement it the Math way, but I was too lazy so I'll just use the String solution.
String Solution:
Code:
static class CONST_STR_FORMAT
{
private static System.Globalization.CultureInfo s_ciCommon = System.Globalization.CultureInfo.InvariantCulture;
public static System.Globalization.CultureInfo CI_COMMON => s_ciCommon;
//source: https://stackoverflow.com/questions/1546113/double-to-string-conversion-without-scientific-notation
public const string FORMAT_DOUBLE = "0.###################################################################################################################################################################################################################################################################################################################################################";
}
class DigitToNumTranslator
{
private bool m_isDot;
//Minus is handled as operator, not the job for translator
//Helper
private StringBuilder m_builder = new StringBuilder();
public double NumResult
{
get
{
return double.Parse(m_builder.ToString(), CONST_STR_FORMAT.CI_COMMON);
}
}
public void Reset()
{
m_builder.Clear();
m_isDot = false;
}
public void StartDot()
{
if (!m_isDot)
{
m_isDot = true;
m_builder.Append('.');
}
}
public void InsertDigit(int digit)
{
m_builder.Append(digit);
}
}
class NumToDigitTranslator
{
private List<int> m_lstDigit;
private IList<int> m_lstDigitReadOnly;
private int m_dotIdx;
private bool m_isMinus;
public IList<int> LstDigit => m_lstDigitReadOnly;
public int DotIdx => m_dotIdx;
public bool IsMinus => m_isMinus;
public NumToDigitTranslator()
{
m_lstDigit = new List<int>();
m_lstDigitReadOnly = m_lstDigit.AsReadOnly();
}
public void Translate(double num)
{
m_lstDigit.Clear();
m_dotIdx = 0;
m_isMinus = false;
var szNum = num.ToString(CONST_STR_FORMAT.FORMAT_DOUBLE, CONST_STR_FORMAT.CI_COMMON);
for (var i = 0; i < szNum.Length; ++i)
{
if (char.IsNumber(szNum[i]))
m_lstDigit.Add(int.Parse(szNum[i].ToString()));
else if (szNum[i] == '-')
m_isMinus = true;
else if (szNum[i] == '.')
m_dotIdx = i;
}
//Reverse for display
if (m_dotIdx != 0)
m_dotIdx = szNum.Length - 1 - m_dotIdx;
m_lstDigit.Reverse();
}
}
Note: No more headache. What I'm afraid most is bugs by culture (bug that happens on some devices but not on my device), hope the code System.Globalization.CultureInfo.InvariantCulture will make sure that nightmare won't happen.

Marshal a va_list

I have the following code:
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void PanicFuncDelegate(string str, IntPtr args);
private void PanicFunc(string str, IntPtr args)
{
LogFunc("PANIC", str, args);
}
public void LogFunc(string severity, string str, IntPtr args)
{
vprintf($"[{severity}] "+ str,args);
}
[DllImport("libc.so.6")]
private static extern int vprintf(string format, IntPtr args);
This prints to the console the messages correctly formatted. I want to retrieve the values from args to use them in my own logger.
If I try to get the value of each pointer from the array in args (as suggested here: Marshal va_list in C# delegate) I get segmentation fault.
Any suggestions?
I have a function call with this working, here's what I do:
For the DLLImport I use an __arglist to marshall to the va_list,
[DllImport("libc.so.6")]
private static extern int vprintf(string format, __arglist);
Then when calling the function I create the __arglist,
vprintf(string format, __arglist(arg1, arg2, arg3...))
Ofcourse you would need to either call the function with all the arguments statically or build that __arglist dynamically, I don't have the code here but it's possible.
I wonder if you get a segmentation fault because the elements in the object[] are not pinned? Maybe if you pin the object[] and all elements within that would help? Just a guess though.
Just think on how C program gets variables from va_list, and there is the solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
namespace VaTest
{
class Program
{
static void Main(string[] args)
{
MarshalVaArgs(vaList => vprintf("%c%d%s", vaList), false, 'a', 123, "bc");
}
[DllImport("msvcrt")] //windows
//[DllImport("c")] //linux
private static extern int vprintf(string format, IntPtr vaList);
private static int IntSizeOf(Type t)
{
return (Marshal.SizeOf(t) + IntPtr.Size - 1) & ~(IntPtr.Size - 1);
}
public static void MarshalVaArgs(Action<IntPtr> action, bool? isUnicode, params object[] args)
{
var sizes = new int[args.Length];
for (var i = 0; i < args.Length; i++)
{
sizes[i] = args[i] is string ? IntPtr.Size : IntSizeOf(args[i].GetType());
}
var allocs = new List<IntPtr>();
var offset = 0;
var result = Marshal.AllocHGlobal(sizes.Sum());
allocs.Add(result);
for (var i = 0; i < args.Length; i++)
{
if (args[i] is string)
{
var s = (string)args[i];
var data = default(IntPtr);
if (isUnicode.HasValue)
{
if (isUnicode.Value)
{
data = Marshal.StringToHGlobalUni(s);
}
else
{
data = Marshal.StringToHGlobalAnsi(s);
}
}
else
{
data = Marshal.StringToHGlobalAuto(s);
}
allocs.Add(data);
Marshal.WriteIntPtr(result, offset, data);
offset += sizes[i];
}
else
{
Marshal.StructureToPtr(args[i], result + offset, false);
offset += sizes[i];
}
}
action(result);
foreach (var ptr in allocs)
{
Marshal.FreeHGlobal(ptr);
}
}
}
}
The code is written and tested with .NET Core 3.0 preview 5, compatible with .NET Framework 4.0 and C# 3.0.
Outputs:
a123bc
As this isn't solved yet i post a long solution that worked for me.
I found the solution in an abandoned project
https://github.com/GoaLitiuM/libobs-sharp
Use like this (tested with FFmpeg):
var objects = va_list_Helper.VaListToArray(format, va_List_Ptr);
// format: frame=%4d QP=%.2f NAL=%d Slice:%c Poc:%-3d I:%-4d P:%-4d SKIP:%-4d size=%d bytes%s
// format (filled): frame= 3 QP=13.00 NAL=0 Slice:B Poc:4 I:0 P:8 SKIP:912 size=32 bytes
// va_List objects: 3, 13, 0, 'B', 4, 0, 8, 912, 32
The classes needed:
public class va_list_Helper
{
public static unsafe object[] VaListToArray(string format, byte* va_list)
{
var vaList = new va_list((IntPtr)va_list);
return vaList.GetObjectsByFormat(format);
}
}
public static class Printf
{
// used
public static string[] GetFormatSpecifiers(string format)
{
if (format.IndexOf('%') == -1)
return null;
// find specifiers from format string
List<int> indices = new List<int>();
for (int j = 0; j < format.Length; j++)
{
j = format.IndexOf('%', j);
if (j == -1)
break;
indices.Add(j);
if (format[j + 1] == '%') // ignore "%%"
j++;
}
if (indices.Count == 0)
return null;
List<string> formats = new List<string>(indices.Count);
for (int mi = 0; mi < indices.Count; mi++)
{
string formatSpecifier = format.Substring(indices[mi], (mi + 1 < indices.Count ? indices[mi + 1] : format.Length) - indices[mi]);
if (!string.IsNullOrWhiteSpace(formatSpecifier))
formats.Add(formatSpecifier);
}
return formats.ToArray();
}
public class FormatSpecificationInfo
{
public string specification;
//public int parameter;
public char type;
public int width;
public int precision;
public FormatFlags flags;
};
[Flags]
public enum FormatFlags
{
// Type length
IsLong = 0x0001, // l
IsLongLong = 0x0002, // ll
IsShort = 0x0004, // h
IsChar = 0x0008, // hh
IsLongDouble = 0x0016, // L
// Flags
LeftAlign = 0x0100, // '-' left align within the width
Sign = 0x0200, // '+' use - or + signs for signed types
Alternate = 0x0400, // '#' prefix non-zero values with hex types
ZeroPad = 0x0800, // '0' pad with zeros
Blank = 0x1000, // ' ' pad sign with blank
Grouping = 0x2000, // '\' group by thousands
ArchSize = 0x4000, // '?' use arch precision
// Dynamic parameters
DynamicWidth = 0x10000,
DynamicPrecision = 0x20000,
};
// used
public static FormatSpecificationInfo GetFormatSpecifierInfo(string specification)
{
if (string.IsNullOrWhiteSpace(specification))
return null;
FormatSpecificationInfo info = new FormatSpecificationInfo()
{
type = '\0',
width = int.MinValue,
precision = 6,
};
string width = "";
string precision = "";
int start = -1;
int fsLength = 1;
// TODO: parse parameter index
for (int i = 0; i < specification.Length && info.type == '\0'; i++)
{
char c = specification[i];
switch (c)
{
case '%':
if (start == -1)
start = i;
else
info.type = c;
info.specification = specification.Substring(start, i + 1 - start);
fsLength = i + 1;
break;
// flags
case '-':
info.flags |= FormatFlags.LeftAlign;
break;
case '+':
info.flags |= FormatFlags.Sign;
break;
case ' ':
info.flags |= FormatFlags.Blank;
break;
case '#':
info.flags |= FormatFlags.Alternate;
break;
case '\'':
info.flags |= FormatFlags.Grouping;
break;
case '?':
info.flags |= FormatFlags.ArchSize;
break;
// precision
case '.':
{
for (int j = i + 1; j < specification.Length; j++)
{
if (specification[j] == '*')
info.flags |= FormatFlags.DynamicPrecision;
else if (char.IsNumber(specification[j]))
precision += specification[j];
else
break;
i++;
}
}
break;
// length flags
case 'h':
info.flags += (int)FormatFlags.IsShort;
break;
case 'l':
info.flags += (int)FormatFlags.IsLong;
break;
case 'L':
info.flags |= FormatFlags.IsLongDouble;
break;
case 'z':
case 'j':
case 't':
// not supported
break;
// dynamic width
case '*':
info.flags |= FormatFlags.DynamicWidth;
break;
default:
{
if (char.IsNumber(c))
{
if (width == "" && c == '0')
info.flags |= FormatFlags.ZeroPad;
else
width += c;
}
else if (char.IsLetter(c) && info.type == '\0')
{
info.type = c;
info.specification = specification.Substring(start, i + 1 - start);
fsLength = i + 1;
}
}
break;
}
}
// sign overrides space
if (info.flags.HasFlag(FormatFlags.Sign) && info.flags.HasFlag(FormatFlags.Blank))
info.flags &= ~FormatFlags.Blank;
if (info.flags.HasFlag(FormatFlags.LeftAlign) && info.flags.HasFlag(FormatFlags.ZeroPad))
info.flags &= ~FormatFlags.ZeroPad;
// unsupported precision for these types
if (info.type == 's' ||
info.type == 'c' ||
Char.ToUpper(info.type) == 'X' ||
info.type == 'o')
{
info.precision = int.MinValue;
}
if (!string.IsNullOrWhiteSpace(precision))
info.precision = Convert.ToInt32(precision);
if (!string.IsNullOrWhiteSpace(width))
info.width = Convert.ToInt32(width);
return info;
}
}
public class va_list
{
internal IntPtr instance; //unmanaged pointer to va_list
public va_list(IntPtr ptr)
{
instance = ptr;
}
/// <summary> Returns unmanaged pointer to argument list. </summary>
public IntPtr GetPointer()
{
return instance;
}
/// <summary> Returns array of objects with help of printf format string. </summary>
/// <param name="format"> printf format string. </param>
public object[] GetObjectsByFormat(string format)
{
return GetObjectsByFormat(format, this);
}
public static unsafe object[] GetObjectsByFormat(string format, va_list va_list)
{
string[] formatSpecifiers = Printf.GetFormatSpecifiers(format);
if (formatSpecifiers == null || va_list == null || va_list.GetPointer() == IntPtr.Zero)
return null;
IntPtr args = va_list.GetPointer();
List<object> objects = new List<object>(formatSpecifiers.Length);
//var bytesDebug = new byte[format.Length];
//Marshal.Copy(va_list.GetPointer(), bytesDebug, 0, bytesDebug.Length);
int offset = 0;
foreach (string spec in formatSpecifiers)
{
var info = Printf.GetFormatSpecifierInfo(spec);
if (info.type == '\0')
continue;
// dynamic width and precision arguments
// these are stored in stack before the actual value
if (info.flags.HasFlag(Printf.FormatFlags.DynamicWidth))
{
int widthArg = Marshal.ReadInt32(args, offset);
objects.Add(widthArg);
offset += Marshal.SizeOf(typeof(IntPtr));
}
if (info.flags.HasFlag(Printf.FormatFlags.DynamicPrecision))
{
int precArg = Marshal.ReadInt32(args, offset);
objects.Add(precArg);
offset += Marshal.SizeOf(typeof(IntPtr));
}
int iSize = info.flags.HasFlag(Printf.FormatFlags.IsLongLong)
? Marshal.SizeOf(typeof(Int64)) : Marshal.SizeOf(typeof(IntPtr));
// marshal objects from pointer
switch (info.type)
{
// 8/16-bit integers
// char / wchar_t (promoted to int)
case 'c':
char c = (char)Marshal.ReadByte(args, offset);
objects.Add(c);
//offset += Marshal.SizeOf(typeof(Int32));
offset += Marshal.SizeOf(typeof(IntPtr));
break;
// signed integers
case 'd':
case 'i':
{
if (info.flags.HasFlag(Printf.FormatFlags.IsShort)) // h
{
short sh = (short)Marshal.ReadInt32(args, offset);
objects.Add(sh);
offset += Marshal.SizeOf(typeof(Int32));
}
else if (info.flags.HasFlag(Printf.FormatFlags.IsLongLong)) // ll
{
long l = Marshal.ReadInt64(args, offset);
objects.Add(l);
offset += iSize;
}
else // int and long types
{
var i = Marshal.ReadInt32(args, offset);
objects.Add(i);
offset += iSize;
}
}
break;
// unsigned integers
case 'u':
case 'o':
case 'x':
case 'X':
{
if (info.flags.HasFlag(Printf.FormatFlags.IsShort)) // h
{
ushort su = (ushort)Marshal.ReadInt32(args, offset);
objects.Add(su);
offset += Marshal.SizeOf(typeof(Int32));
}
else if (info.flags.HasFlag(Printf.FormatFlags.IsLongLong)) // ll
{
ulong lu = (ulong)(long)Marshal.ReadInt64(args, offset);
objects.Add(lu);
offset += iSize;
}
else // uint and ulong types
{
uint u = (uint)Marshal.ReadInt32(args, offset);
objects.Add(u);
offset += iSize;
}
}
break;
// floating-point types
case 'f':
case 'F':
case 'e':
case 'E':
case 'g':
case 'G':
{
if (info.flags.HasFlag(Printf.FormatFlags.IsLongDouble)) // L
{
// not really supported but read it as long
long lfi = Marshal.ReadInt64(args, offset);
double d = *(double*)(void*)&lfi;
objects.Add(d);
offset += Marshal.SizeOf(typeof(double));
}
else // double
{
long lfi = Marshal.ReadInt64(args, offset);
double d = *(double*)(void*)&lfi;
objects.Add(d);
offset += Marshal.SizeOf(typeof(double));
}
}
break;
// string
case 's':
{
string s = null;
// same:
//var addr1 = new IntPtr(args.ToInt64() + offset);
//var intPtr4 = Marshal.ReadIntPtr(addr1);
var intPtr3 = Marshal.ReadIntPtr(args, offset);
if (info.flags.HasFlag(Printf.FormatFlags.IsLong))
{
s = Marshal.PtrToStringUni(intPtr3);
}
else
{
s = Marshal.PtrToStringAnsi(intPtr3);
}
objects.Add(s);
offset += Marshal.SizeOf(typeof(IntPtr));
}
break;
// pointer
case 'p':
IntPtr ptr = Marshal.ReadIntPtr(args, offset);
objects.Add(ptr);
offset += Marshal.SizeOf(typeof(IntPtr));
break;
// non-marshallable types, ignored
case ' ':
case '%':
case 'n':
break;
default:
throw new ApplicationException("printf specifier '%" + info.type + "' not supported");
}
}
return objects.ToArray();
}
}

Natural Sort Order in C#

Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo array? I am implementing the IComparer interface in my sorts.
The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your IComparer:
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);
Michael Kaplan has some examples of how this function works here, and the changes that were made for Vista to make it work more intuitively. The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.
So a complete implementation would be something like:
[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
public static extern int StrCmpLogicalW(string psz1, string psz2);
}
public sealed class NaturalStringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
return SafeNativeMethods.StrCmpLogicalW(a, b);
}
}
public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
public int Compare(FileInfo a, FileInfo b)
{
return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
}
}
Just thought I'd add to this (with the most concise solution I could find):
public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
int max = source
.SelectMany(i => Regex.Matches(selector(i), #"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
.Max() ?? 0;
return source.OrderBy(i => Regex.Replace(selector(i), #"\d+", m => m.Value.PadLeft(max, '0')));
}
The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.
The cast to (int?) is to allow for collections of strings without any numbers (.Max() on an empty enumerable throws an InvalidOperationException).
None of the existing implementations looked great so I wrote my own. The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8). The only differences I've seen are 1) although Windows used to (e.g. XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (e.g. "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).
public static int CompareNatural(string strA, string strB) {
return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}
public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
CompareInfo cmp = culture.CompareInfo;
int iA = 0;
int iB = 0;
int softResult = 0;
int softResultWeight = 0;
while (iA < strA.Length && iB < strB.Length) {
bool isDigitA = Char.IsDigit(strA[iA]);
bool isDigitB = Char.IsDigit(strB[iB]);
if (isDigitA != isDigitB) {
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (!isDigitA && !isDigitB) {
int jA = iA + 1;
int jB = iB + 1;
while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
if (cmpResult != 0) {
// Certain strings may be considered different due to "soft" differences that are
// ignored if more significant differences follow, e.g. a hyphen only affects the
// comparison if no other differences follow
string sectionA = strA.Substring(iA, jA - iA);
string sectionB = strB.Substring(iB, jB - iB);
if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
cmp.Compare(sectionA + "2", sectionB + "1", options))
{
return cmp.Compare(strA, iA, strB, iB, options);
}
else if (softResultWeight < 1) {
softResult = cmpResult;
softResultWeight = 1;
}
}
iA = jA;
iB = jB;
}
else {
char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
int jA = iA;
int jB = iB;
while (jA < strA.Length && strA[jA] == zeroA) jA++;
while (jB < strB.Length && strB[jB] == zeroB) jB++;
int resultIfSameLength = 0;
do {
isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
if (isDigitA && isDigitB) {
if (numA != numB && resultIfSameLength == 0) {
resultIfSameLength = numA < numB ? -1 : 1;
}
jA++;
jB++;
}
}
while (isDigitA && isDigitB);
if (isDigitA != isDigitB) {
// One number has more digits than the other (ignoring leading zeros) - the longer
// number must be larger
return isDigitA ? 1 : -1;
}
else if (resultIfSameLength != 0) {
// Both numbers are the same length (ignoring leading zeros) and at least one of
// the digits differed - the first difference determines the result
return resultIfSameLength;
}
int lA = jA - iA;
int lB = jB - iB;
if (lA != lB) {
// Both numbers are equivalent but one has more leading zeros
return lA > lB ? -1 : 1;
}
else if (zeroA != zeroB && softResultWeight < 2) {
softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
softResultWeight = 2;
}
iA = jA;
iB = jB;
}
}
if (iA < strA.Length || iB < strB.Length) {
return iA < strA.Length ? 1 : -1;
}
else if (softResult != 0) {
return softResult;
}
return 0;
}
The signature matches the Comparison<string> delegate:
string[] files = Directory.GetFiles(#"C:\");
Array.Sort(files, CompareNatural);
Here's a wrapper class for use as IComparer<string>:
public class CustomComparer<T> : IComparer<T> {
private Comparison<T> _comparison;
public CustomComparer(Comparison<T> comparison) {
_comparison = comparison;
}
public int Compare(T x, T y) {
return _comparison(x, y);
}
}
Example:
string[] files = Directory.EnumerateFiles(#"C:\")
.OrderBy(f => f, new CustomComparer<string>(CompareNatural))
.ToArray();
Here's a good set of filenames I use for testing:
Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
"KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
"LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
"NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
"Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
"MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
"bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
"KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
"bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
"b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
"KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
"NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
"ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
"NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
"rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
"KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
"cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
"lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
"KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
"cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
"hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
"KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
"cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
"YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
"KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
"McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
"KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
"Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
"bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
.Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
.Select(n => expand(n)).ToArray();
Matthews Horsleys answer is the fastest method which doesn't change behaviour depending on which version of windows your program is running on. However, it can be even faster by creating the regex once, and using RegexOptions.Compiled. I also added the option of inserting a string comparer so you can ignore case if needed, and improved readability a bit.
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
{
var regex = new Regex(#"\d+", RegexOptions.Compiled);
int maxDigits = items
.SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
.Max() ?? 0;
return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
}
Use by
var sortedEmployees = employees.OrderByNatural(emp => emp.Name);
This takes 450ms to sort 100,000 strings compared to 300ms for the default .net string comparison - pretty fast!
Pure C# solution for linq orderby:
http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html
public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
private bool isAscending;
public NaturalSortComparer(bool inAscendingOrder = true)
{
this.isAscending = inAscendingOrder;
}
#region IComparer<string> Members
public int Compare(string x, string y)
{
throw new NotImplementedException();
}
#endregion
#region IComparer<string> Members
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
return 0;
string[] x1, y1;
if (!table.TryGetValue(x, out x1))
{
x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
table.Add(x, x1);
}
if (!table.TryGetValue(y, out y1))
{
y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
table.Add(y, y1);
}
int returnVal;
for (int i = 0; i < x1.Length && i < y1.Length; i++)
{
if (x1[i] != y1[i])
{
returnVal = PartCompare(x1[i], y1[i]);
return isAscending ? returnVal : -returnVal;
}
}
if (y1.Length > x1.Length)
{
returnVal = 1;
}
else if (x1.Length > y1.Length)
{
returnVal = -1;
}
else
{
returnVal = 0;
}
return isAscending ? returnVal : -returnVal;
}
private static int PartCompare(string left, string right)
{
int x, y;
if (!int.TryParse(left, out x))
return left.CompareTo(right);
if (!int.TryParse(right, out y))
return left.CompareTo(right);
return x.CompareTo(y);
}
#endregion
private Dictionary<string, string[]> table = new Dictionary<string, string[]>();
public void Dispose()
{
table.Clear();
table = null;
}
}
My solution:
void Main()
{
new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}
public class NaturalStringComparer : IComparer<string>
{
private static readonly Regex _re = new Regex(#"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);
public int Compare(string x, string y)
{
x = x.ToLower();
y = y.ToLower();
if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
{
if(x.Length == y.Length) return 0;
return x.Length < y.Length ? -1 : 1;
}
var a = _re.Split(x);
var b = _re.Split(y);
int i = 0;
while(true)
{
int r = PartCompare(a[i], b[i]);
if(r != 0) return r;
++i;
}
}
private static int PartCompare(string x, string y)
{
int a, b;
if(int.TryParse(x, out a) && int.TryParse(y, out b))
return a.CompareTo(b);
return x.CompareTo(y);
}
}
Results:
1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2
You do need to be careful -- I vaguely recall reading that StrCmpLogicalW, or something like it, was not strictly transitive, and I have observed .NET's sort methods to sometimes get stuck in infinite loops if the comparison function breaks that rule.
A transitive comparison will always report that a < c if a < b and b < c. There exists a function that does a natural sort order comparison that does not always meet that criterion, but I can't recall whether it is StrCmpLogicalW or something else.
This is my code to sort a string having both alpha and numeric characters.
First, this extension method:
public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
return me.OrderBy(x => Regex.Replace(x, #"\d+", m => m.Value.PadLeft(50, '0')));
}
Then, simply use it anywhere in your code like this:
List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();
How does it works ? By replaceing with zeros:
Original | Regex Replace | The | Returned
List | Apply PadLeft | Sorting | List
| | |
"The 1st" | "The 001st" | "The 001st" | "The 1st"
"The 12th" | "The 012th" | "The 002nd" | "The 2nd"
"The 2nd" | "The 002nd" | "The 012th" | "The 12th"
Works with multiples numbers:
Alphabetical Sorting | Alphanumeric Sorting
|
"Page 21, Line 42" | "Page 3, Line 7"
"Page 21, Line 5" | "Page 3, Line 32"
"Page 3, Line 32" | "Page 21, Line 5"
"Page 3, Line 7" | "Page 21, Line 42"
Hope that's will help.
Here's a version for .NET Core 2.1+ / .NET 5.0+, using spans to avoid allocations
public class NaturalSortStringComparer : IComparer<string>
{
public static NaturalSortStringComparer Ordinal { get; } = new NaturalSortStringComparer(StringComparison.Ordinal);
public static NaturalSortStringComparer OrdinalIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.OrdinalIgnoreCase);
public static NaturalSortStringComparer CurrentCulture { get; } = new NaturalSortStringComparer(StringComparison.CurrentCulture);
public static NaturalSortStringComparer CurrentCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.CurrentCultureIgnoreCase);
public static NaturalSortStringComparer InvariantCulture { get; } = new NaturalSortStringComparer(StringComparison.InvariantCulture);
public static NaturalSortStringComparer InvariantCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.InvariantCultureIgnoreCase);
private readonly StringComparison _comparison;
public NaturalSortStringComparer(StringComparison comparison)
{
_comparison = comparison;
}
public int Compare(string x, string y)
{
// Let string.Compare handle the case where x or y is null
if (x is null || y is null)
return string.Compare(x, y, _comparison);
var xSegments = GetSegments(x);
var ySegments = GetSegments(y);
while (xSegments.MoveNext() && ySegments.MoveNext())
{
int cmp;
// If they're both numbers, compare the value
if (xSegments.CurrentIsNumber && ySegments.CurrentIsNumber)
{
var xValue = long.Parse(xSegments.Current);
var yValue = long.Parse(ySegments.Current);
cmp = xValue.CompareTo(yValue);
if (cmp != 0)
return cmp;
}
// If x is a number and y is not, x is "lesser than" y
else if (xSegments.CurrentIsNumber)
{
return -1;
}
// If y is a number and x is not, x is "greater than" y
else if (ySegments.CurrentIsNumber)
{
return 1;
}
// OK, neither are number, compare the segments as text
cmp = xSegments.Current.CompareTo(ySegments.Current, _comparison);
if (cmp != 0)
return cmp;
}
// At this point, either all segments are equal, or one string is shorter than the other
// If x is shorter, it's "lesser than" y
if (x.Length < y.Length)
return -1;
// If x is longer, it's "greater than" y
if (x.Length > y.Length)
return 1;
// If they have the same length, they're equal
return 0;
}
private static StringSegmentEnumerator GetSegments(string s) => new StringSegmentEnumerator(s);
private struct StringSegmentEnumerator
{
private readonly string _s;
private int _start;
private int _length;
public StringSegmentEnumerator(string s)
{
_s = s;
_start = -1;
_length = 0;
CurrentIsNumber = false;
}
public ReadOnlySpan<char> Current => _s.AsSpan(_start, _length);
public bool CurrentIsNumber { get; private set; }
public bool MoveNext()
{
var currentPosition = _start >= 0
? _start + _length
: 0;
if (currentPosition >= _s.Length)
return false;
int start = currentPosition;
bool isFirstCharDigit = Char.IsDigit(_s[currentPosition]);
while (++currentPosition < _s.Length && Char.IsDigit(_s[currentPosition]) == isFirstCharDigit)
{
}
_start = start;
_length = currentPosition - start;
CurrentIsNumber = isFirstCharDigit;
return true;
}
}
}
Adding to Greg Beech's answer (because I've just been searching for that), if you want to use this from Linq you can use the OrderBy that takes an IComparer. E.g.:
var items = new List<MyItem>();
// fill items
var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());
Here's a relatively simple example that doesn't use P/Invoke and avoids any allocation during execution.
Feel free to use the code from here, or if it's easier there's a NuGet package:
https://www.nuget.org/packages/NaturalSort
https://github.com/drewnoakes/natural-sort
internal sealed class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y)
{
// sort nulls to the start
if (x == null)
return y == null ? 0 : -1;
if (y == null)
return 1;
var ix = 0;
var iy = 0;
while (true)
{
// sort shorter strings to the start
if (ix >= x.Length)
return iy >= y.Length ? 0 : -1;
if (iy >= y.Length)
return 1;
var cx = x[ix];
var cy = y[iy];
int result;
if (char.IsDigit(cx) && char.IsDigit(cy))
result = CompareInteger(x, y, ref ix, ref iy);
else
result = cx.CompareTo(y[iy]);
if (result != 0)
return result;
ix++;
iy++;
}
}
private static int CompareInteger(string x, string y, ref int ix, ref int iy)
{
var lx = GetNumLength(x, ix);
var ly = GetNumLength(y, iy);
// shorter number first (note, doesn't handle leading zeroes)
if (lx != ly)
return lx.CompareTo(ly);
for (var i = 0; i < lx; i++)
{
var result = x[ix++].CompareTo(y[iy++]);
if (result != 0)
return result;
}
return 0;
}
private static int GetNumLength(string s, int i)
{
var length = 0;
while (i < s.Length && char.IsDigit(s[i++]))
length++;
return length;
}
}
It doesn't ignore leading zeroes, so 01 comes after 2.
Corresponding unit test:
public class NumericStringComparerTests
{
[Fact]
public void OrdersCorrectly()
{
AssertEqual("", "");
AssertEqual(null, null);
AssertEqual("Hello", "Hello");
AssertEqual("Hello123", "Hello123");
AssertEqual("123", "123");
AssertEqual("123Hello", "123Hello");
AssertOrdered("", "Hello");
AssertOrdered(null, "Hello");
AssertOrdered("Hello", "Hello1");
AssertOrdered("Hello123", "Hello124");
AssertOrdered("Hello123", "Hello133");
AssertOrdered("Hello123", "Hello223");
AssertOrdered("123", "124");
AssertOrdered("123", "133");
AssertOrdered("123", "223");
AssertOrdered("123", "1234");
AssertOrdered("123", "2345");
AssertOrdered("0", "1");
AssertOrdered("123Hello", "124Hello");
AssertOrdered("123Hello", "133Hello");
AssertOrdered("123Hello", "223Hello");
AssertOrdered("123Hello", "1234Hello");
}
private static void AssertEqual(string x, string y)
{
Assert.Equal(0, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal(0, NaturalStringComparer.Instance.Compare(y, x));
}
private static void AssertOrdered(string x, string y)
{
Assert.Equal(-1, NaturalStringComparer.Instance.Compare(x, y));
Assert.Equal( 1, NaturalStringComparer.Instance.Compare(y, x));
}
}
I've actually implemented it as an extension method on the StringComparer so that you could do for example:
StringComparer.CurrentCulture.WithNaturalSort() or
StringComparer.OrdinalIgnoreCase.WithNaturalSort().
The resulting IComparer<string> can be used in all places like OrderBy, OrderByDescending, ThenBy, ThenByDescending, SortedSet<string>, etc. And you can still easily tweak case sensitivity, culture, etc.
The implementation is fairly trivial and it should perform quite well even on large sequences.
I've also published it as a tiny NuGet package, so you can just do:
Install-Package NaturalSort.Extension
The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository.
The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):
public static class StringComparerNaturalSortExtension
{
public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);
private class NaturalSortComparer : IComparer<string>
{
public NaturalSortComparer(StringComparer stringComparer)
{
_stringComparer = stringComparer;
}
private readonly StringComparer _stringComparer;
private static readonly Regex NumberSequenceRegex = new Regex(#"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;
public int Compare(string s1, string s2)
{
var tokens1 = Tokenize(s1);
var tokens2 = Tokenize(s2);
var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
if (zipCompare != 0)
return zipCompare;
var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
return lengthCompare;
}
private int TokenCompare(string token1, string token2)
{
var number1 = ParseNumberOrZero(token1);
var number2 = ParseNumberOrZero(token2);
var numberCompare = number1.CompareTo(number2);
if (numberCompare != 0)
return numberCompare;
var stringCompare = _stringComparer.Compare(token1, token2);
return stringCompare;
}
}
}
Inspired by Michael Parker's solution, here is an IComparer implementation that you can drop in to any of the linq ordering methods:
private class NaturalStringComparer : IComparer<string>
{
public int Compare(string left, string right)
{
int max = new[] { left, right }
.SelectMany(x => Regex.Matches(x, #"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
.Max() ?? 0;
var leftPadded = Regex.Replace(left, #"\d+", m => m.Value.PadLeft(max, '0'));
var rightPadded = Regex.Replace(right, #"\d+", m => m.Value.PadLeft(max, '0'));
return string.Compare(leftPadded, rightPadded);
}
}
Here is a naive one-line regex-less LINQ way (borrowed from python):
var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]
Expanding on a couple of the previous answers and making use of extension methods, I came up with the following that doesn't have the caveats of potential multiple enumerable enumeration, or performance issues concerned with using multiple regex objects, or calling regex needlessly, that being said, it does use ToList(), which can negate the benefits in larger collections.
The selector supports generic typing to allow any delegate to be assigned, the elements in the source collection are mutated by the selector, then converted to strings with ToString().
private static readonly Regex _NaturalOrderExpr = new Regex(#"\d+", RegexOptions.Compiled);
public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderBy(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
int max = 0;
var selection = source.Select(
o =>
{
var v = selector(o);
var s = v != null ? v.ToString() : String.Empty;
if (!String.IsNullOrWhiteSpace(s))
{
var mc = _NaturalOrderExpr.Matches(s);
if (mc.Count > 0)
{
max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
}
}
return new
{
Key = o,
Value = s
};
}).ToList();
return
selection.OrderByDescending(
o =>
String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
.Select(o => o.Key);
}
A version that's easier to read/maintain.
public class NaturalStringComparer : IComparer<string>
{
public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();
public int Compare(string x, string y) {
const int LeftIsSmaller = -1;
const int RightIsSmaller = 1;
const int Equal = 0;
var leftString = x;
var rightString = y;
var stringComparer = CultureInfo.CurrentCulture.CompareInfo;
int rightIndex;
int leftIndex;
for (leftIndex = 0, rightIndex = 0;
leftIndex < leftString.Length && rightIndex < rightString.Length;
leftIndex++, rightIndex++) {
var leftChar = leftString[leftIndex];
var rightChar = rightString[leftIndex];
var leftIsNumber = char.IsNumber(leftChar);
var rightIsNumber = char.IsNumber(rightChar);
if (!leftIsNumber && !rightIsNumber) {
var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
if (result != 0) return result;
} else if (leftIsNumber && !rightIsNumber) {
return LeftIsSmaller;
} else if (!leftIsNumber && rightIsNumber) {
return RightIsSmaller;
} else {
var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);
if (leftNumberLength < rightNumberLength) {
return LeftIsSmaller;
} else if (leftNumberLength > rightNumberLength) {
return RightIsSmaller;
} else {
if(leftNumber < rightNumber) {
return LeftIsSmaller;
} else if(leftNumber > rightNumber) {
return RightIsSmaller;
}
}
}
}
if (leftString.Length < rightString.Length) {
return LeftIsSmaller;
} else if(leftString.Length > rightString.Length) {
return RightIsSmaller;
}
return Equal;
}
public int NumberLength(string str, int offset, out int number) {
if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");
var currentOffset = offset;
var curChar = str[currentOffset];
if (!char.IsNumber(curChar))
throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));
int length = 1;
var numberString = string.Empty;
for (currentOffset = offset + 1;
currentOffset < str.Length;
currentOffset++, length++) {
curChar = str[currentOffset];
numberString += curChar;
if (!char.IsNumber(curChar)) {
number = int.Parse(numberString);
return length;
}
}
number = int.Parse(numberString);
return length;
}
}
We had a need for a natural sort to deal with text with the following pattern:
"Test 1-1-1 something"
"Test 1-2-3 something"
...
For some reason when I first looked on SO, I didn't find this post and implemented our own. Compared to some of the solutions presented here, while similar in concept, it could have the benefit of maybe being simpler and easier to understand. However, while I did try to look at performance bottlenecks, It is still a much slower implementation than the default OrderBy().
Here is the extension method I implement:
public static class EnumerableExtensions
{
// set up the regex parser once and for all
private static readonly Regex Regex = new Regex(#"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);
// stateless comparer can be built once
private static readonly AggregateComparer Comparer = new AggregateComparer();
public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
{
// first extract string from object using selector
// then extract digit and non-digit groups
Func<T, IEnumerable<IComparable>> splitter =
s => Regex.Matches(selector(s))
.Cast<Match>()
.Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
return source.OrderBy(splitter, Comparer);
}
/// <summary>
/// This comparer will compare two lists of objects against each other
/// </summary>
/// <remarks>Objects in each list are compare to their corresponding elements in the other
/// list until a difference is found.</remarks>
private class AggregateComparer : IComparer<IEnumerable<IComparable>>
{
public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
{
return
x.Zip(y, (a, b) => new {a, b}) // walk both lists
.Select(pair => pair.a.CompareTo(pair.b)) // compare each object
.FirstOrDefault(result => result != 0); // until a difference is found
}
}
}
The idea is to split the original strings into blocks of digits and non-digits ("\d+|\D+"). Since this is a potentially expensive task, it is done only once per entry. We then use a comparer of comparable objects (sorry, I can't find a more proper way to say it). It compares each block to its corresponding block in the other string.
I would like feedback on how this could be improved and what the major flaws are. Note that maintainability is important to us at this point and we are not currently using this in extremely large data sets.
Let me explain my problem and how i was able to solve it.
Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.
Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name. Now, just do List.Sort(), which sorts the filenames in Natural sorting order. Based on my testing i found that having .png messes up sorting order. Have a look at the below code
var imageNameList = new DirectoryInfo(#"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

Formatting numbers with significant figures in C#

I have some decimal data that I am pushing into a SharePoint list where it is to be viewed. I'd like to restrict the number of significant figures displayed in the result data based on my knowledge of the specific calculation. Sometimes it'll be 3, so 12345 will become 12300 and 0.012345 will become 0.0123. Occasionally it will be 4 or 5. Is there any convenient way to handle this?
See: RoundToSignificantFigures by "P Daddy".
I've combined his method with another one I liked.
Rounding to significant figures is a lot easier in TSQL where the rounding method is based on rounding position, not number of decimal places - which is the case with .Net math.round. You could round a number in TSQL to negative places, which would round at whole numbers - so the scaling isn't needed.
Also see this other thread. Pyrolistical's method is good.
The trailing zeros part of the problem seems like more of a string operation to me, so I included a ToString() extension method which will pad zeros if necessary.
using System;
using System.Globalization;
public static class Precision
{
// 2^-24
public const float FLOAT_EPSILON = 0.0000000596046448f;
// 2^-53
public const double DOUBLE_EPSILON = 0.00000000000000011102230246251565d;
public static bool AlmostEquals(this double a, double b, double epsilon = DOUBLE_EPSILON)
{
// ReSharper disable CompareOfFloatsByEqualityOperator
if (a == b)
{
return true;
}
// ReSharper restore CompareOfFloatsByEqualityOperator
return (System.Math.Abs(a - b) < epsilon);
}
public static bool AlmostEquals(this float a, float b, float epsilon = FLOAT_EPSILON)
{
// ReSharper disable CompareOfFloatsByEqualityOperator
if (a == b)
{
return true;
}
// ReSharper restore CompareOfFloatsByEqualityOperator
return (System.Math.Abs(a - b) < epsilon);
}
}
public static class SignificantDigits
{
public static double Round(this double value, int significantDigits)
{
int unneededRoundingPosition;
return RoundSignificantDigits(value, significantDigits, out unneededRoundingPosition);
}
public static string ToString(this double value, int significantDigits)
{
// this method will round and then append zeros if needed.
// i.e. if you round .002 to two significant figures, the resulting number should be .0020.
var currentInfo = CultureInfo.CurrentCulture.NumberFormat;
if (double.IsNaN(value))
{
return currentInfo.NaNSymbol;
}
if (double.IsPositiveInfinity(value))
{
return currentInfo.PositiveInfinitySymbol;
}
if (double.IsNegativeInfinity(value))
{
return currentInfo.NegativeInfinitySymbol;
}
int roundingPosition;
var roundedValue = RoundSignificantDigits(value, significantDigits, out roundingPosition);
// when rounding causes a cascading round affecting digits of greater significance,
// need to re-round to get a correct rounding position afterwards
// this fixes a bug where rounding 9.96 to 2 figures yeilds 10.0 instead of 10
RoundSignificantDigits(roundedValue, significantDigits, out roundingPosition);
if (Math.Abs(roundingPosition) > 9)
{
// use exponential notation format
// ReSharper disable FormatStringProblem
return string.Format(currentInfo, "{0:E" + (significantDigits - 1) + "}", roundedValue);
// ReSharper restore FormatStringProblem
}
// string.format is only needed with decimal numbers (whole numbers won't need to be padded with zeros to the right.)
// ReSharper disable FormatStringProblem
return roundingPosition > 0 ? string.Format(currentInfo, "{0:F" + roundingPosition + "}", roundedValue) : roundedValue.ToString(currentInfo);
// ReSharper restore FormatStringProblem
}
private static double RoundSignificantDigits(double value, int significantDigits, out int roundingPosition)
{
// this method will return a rounded double value at a number of signifigant figures.
// the sigFigures parameter must be between 0 and 15, exclusive.
roundingPosition = 0;
if (value.AlmostEquals(0d))
{
roundingPosition = significantDigits - 1;
return 0d;
}
if (double.IsNaN(value))
{
return double.NaN;
}
if (double.IsPositiveInfinity(value))
{
return double.PositiveInfinity;
}
if (double.IsNegativeInfinity(value))
{
return double.NegativeInfinity;
}
if (significantDigits < 1 || significantDigits > 15)
{
throw new ArgumentOutOfRangeException("significantDigits", value, "The significantDigits argument must be between 1 and 15.");
}
// The resulting rounding position will be negative for rounding at whole numbers, and positive for decimal places.
roundingPosition = significantDigits - 1 - (int)(Math.Floor(Math.Log10(Math.Abs(value))));
// try to use a rounding position directly, if no scale is needed.
// this is because the scale mutliplication after the rounding can introduce error, although
// this only happens when you're dealing with really tiny numbers, i.e 9.9e-14.
if (roundingPosition > 0 && roundingPosition < 16)
{
return Math.Round(value, roundingPosition, MidpointRounding.AwayFromZero);
}
// Shouldn't get here unless we need to scale it.
// Set the scaling value, for rounding whole numbers or decimals past 15 places
var scale = Math.Pow(10, Math.Ceiling(Math.Log10(Math.Abs(value))));
return Math.Round(value / scale, significantDigits, MidpointRounding.AwayFromZero) * scale;
}
}
This might do the trick:
double Input1 = 1234567;
string Result1 = Convert.ToDouble(String.Format("{0:G3}",Input1)).ToString("R0");
double Input2 = 0.012345;
string Result2 = Convert.ToDouble(String.Format("{0:G3}", Input2)).ToString("R6");
Changing the G3 to G4 produces the oddest result though.
It appears to round up the significant digits?
I ended up snagging some code from http://ostermiller.org/utils/SignificantFigures.java.html. It was in java, so I did a quick search/replace and some resharper reformatting to make the C# build. It seems to work nicely for my significant figure needs. FWIW, I removed his javadoc comments to make it more concise here, but the original code is documented quite nicely.
/*
* Copyright (C) 2002-2007 Stephen Ostermiller
* http://ostermiller.org/contact.pl?regarding=Java+Utilities
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* See COPYING.TXT for details.
*/
public class SignificantFigures
{
private String original;
private StringBuilder _digits;
private int mantissa = -1;
private bool sign = true;
private bool isZero = false;
private bool useScientificNotation = true;
public SignificantFigures(String number)
{
original = number;
Parse(original);
}
public SignificantFigures(double number)
{
original = Convert.ToString(number);
try
{
Parse(original);
}
catch (Exception nfe)
{
_digits = null;
}
}
public bool UseScientificNotation
{
get { return useScientificNotation; }
set { useScientificNotation = value; }
}
public int GetNumberSignificantFigures()
{
if (_digits == null) return 0;
return _digits.Length;
}
public SignificantFigures SetLSD(int place)
{
SetLMSD(place, Int32.MinValue);
return this;
}
public SignificantFigures SetLMSD(int leastPlace, int mostPlace)
{
if (_digits != null && leastPlace != Int32.MinValue)
{
int significantFigures = _digits.Length;
int current = mantissa - significantFigures + 1;
int newLength = significantFigures - leastPlace + current;
if (newLength <= 0)
{
if (mostPlace == Int32.MinValue)
{
original = "NaN";
_digits = null;
}
else
{
newLength = mostPlace - leastPlace + 1;
_digits.Length = newLength;
mantissa = leastPlace;
for (int i = 0; i < newLength; i++)
{
_digits[i] = '0';
}
isZero = true;
sign = true;
}
}
else
{
_digits.Length = newLength;
for (int i = significantFigures; i < newLength; i++)
{
_digits[i] = '0';
}
}
}
return this;
}
public int GetLSD()
{
if (_digits == null) return Int32.MinValue;
return mantissa - _digits.Length + 1;
}
public int GetMSD()
{
if (_digits == null) return Int32.MinValue;
return mantissa + 1;
}
public override String ToString()
{
if (_digits == null) return original;
StringBuilder digits = new StringBuilder(this._digits.ToString());
int length = digits.Length;
if ((mantissa <= -4 || mantissa >= 7 ||
(mantissa >= length &&
digits[digits.Length - 1] == '0') ||
(isZero && mantissa != 0)) && useScientificNotation)
{
// use scientific notation.
if (length > 1)
{
digits.Insert(1, '.');
}
if (mantissa != 0)
{
digits.Append("E" + mantissa);
}
}
else if (mantissa <= -1)
{
digits.Insert(0, "0.");
for (int i = mantissa; i < -1; i++)
{
digits.Insert(2, '0');
}
}
else if (mantissa + 1 == length)
{
if (length > 1 && digits[digits.Length - 1] == '0')
{
digits.Append('.');
}
}
else if (mantissa < length)
{
digits.Insert(mantissa + 1, '.');
}
else
{
for (int i = length; i <= mantissa; i++)
{
digits.Append('0');
}
}
if (!sign)
{
digits.Insert(0, '-');
}
return digits.ToString();
}
public String ToScientificNotation()
{
if (_digits == null) return original;
StringBuilder digits = new StringBuilder(this._digits.ToString());
int length = digits.Length;
if (length > 1)
{
digits.Insert(1, '.');
}
if (mantissa != 0)
{
digits.Append("E" + mantissa);
}
if (!sign)
{
digits.Insert(0, '-');
}
return digits.ToString();
}
private const int INITIAL = 0;
private const int LEADZEROS = 1;
private const int MIDZEROS = 2;
private const int DIGITS = 3;
private const int LEADZEROSDOT = 4;
private const int DIGITSDOT = 5;
private const int MANTISSA = 6;
private const int MANTISSADIGIT = 7;
private void Parse(String number)
{
int length = number.Length;
_digits = new StringBuilder(length);
int state = INITIAL;
int mantissaStart = -1;
bool foundMantissaDigit = false;
// sometimes we don't know if a zero will be
// significant or not when it is encountered.
// keep track of the number of them so that
// the all can be made significant if we find
// out that they are.
int zeroCount = 0;
int leadZeroCount = 0;
for (int i = 0; i < length; i++)
{
char c = number[i];
switch (c)
{
case '.':
{
switch (state)
{
case INITIAL:
case LEADZEROS:
{
state = LEADZEROSDOT;
}
break;
case MIDZEROS:
{
// we now know that these zeros
// are more than just trailing place holders.
for (int j = 0; j < zeroCount; j++)
{
_digits.Append('0');
}
zeroCount = 0;
state = DIGITSDOT;
}
break;
case DIGITS:
{
state = DIGITSDOT;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
case '+':
{
switch (state)
{
case INITIAL:
{
sign = true;
state = LEADZEROS;
}
break;
case MANTISSA:
{
state = MANTISSADIGIT;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
case '-':
{
switch (state)
{
case INITIAL:
{
sign = false;
state = LEADZEROS;
}
break;
case MANTISSA:
{
state = MANTISSADIGIT;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
case '0':
{
switch (state)
{
case INITIAL:
case LEADZEROS:
{
// only significant if number
// is all zeros.
zeroCount++;
leadZeroCount++;
state = LEADZEROS;
}
break;
case MIDZEROS:
case DIGITS:
{
// only significant if followed
// by a decimal point or nonzero digit.
mantissa++;
zeroCount++;
state = MIDZEROS;
}
break;
case LEADZEROSDOT:
{
// only significant if number
// is all zeros.
mantissa--;
zeroCount++;
state = LEADZEROSDOT;
}
break;
case DIGITSDOT:
{
// non-leading zeros after
// a decimal point are always
// significant.
_digits.Append(c);
}
break;
case MANTISSA:
case MANTISSADIGIT:
{
foundMantissaDigit = true;
state = MANTISSADIGIT;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
{
switch (state)
{
case INITIAL:
case LEADZEROS:
case DIGITS:
{
zeroCount = 0;
_digits.Append(c);
mantissa++;
state = DIGITS;
}
break;
case MIDZEROS:
{
// we now know that these zeros
// are more than just trailing place holders.
for (int j = 0; j < zeroCount; j++)
{
_digits.Append('0');
}
zeroCount = 0;
_digits.Append(c);
mantissa++;
state = DIGITS;
}
break;
case LEADZEROSDOT:
case DIGITSDOT:
{
zeroCount = 0;
_digits.Append(c);
state = DIGITSDOT;
}
break;
case MANTISSA:
case MANTISSADIGIT:
{
state = MANTISSADIGIT;
foundMantissaDigit = true;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
case 'E':
case 'e':
{
switch (state)
{
case INITIAL:
case LEADZEROS:
case DIGITS:
case LEADZEROSDOT:
case DIGITSDOT:
{
// record the starting point of the mantissa
// so we can do a substring to get it back later
mantissaStart = i + 1;
state = MANTISSA;
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
break;
default:
{
throw new Exception(
"Unexpected character '" + c + "' at position " + i
);
}
}
}
if (mantissaStart != -1)
{
// if we had found an 'E'
if (!foundMantissaDigit)
{
// we didn't actually find a mantissa to go with.
throw new Exception(
"No digits in mantissa."
);
}
// parse the mantissa.
mantissa += Convert.ToInt32(number.Substring(mantissaStart));
}
if (_digits.Length == 0)
{
if (zeroCount > 0)
{
// if nothing but zeros all zeros are significant.
for (int j = 0; j < zeroCount; j++)
{
_digits.Append('0');
}
mantissa += leadZeroCount;
isZero = true;
sign = true;
}
else
{
// a hack to catch some cases that we could catch
// by adding a ton of extra states. Things like:
// "e2" "+e2" "+." "." "+" etc.
throw new Exception(
"No digits in number."
);
}
}
}
public SignificantFigures SetNumberSignificantFigures(int significantFigures)
{
if (significantFigures <= 0)
throw new ArgumentException("Desired number of significant figures must be positive.");
if (_digits != null)
{
int length = _digits.Length;
if (length < significantFigures)
{
// number is not long enough, pad it with zeros.
for (int i = length; i < significantFigures; i++)
{
_digits.Append('0');
}
}
else if (length > significantFigures)
{
// number is too long chop some of it off with rounding.
bool addOne; // we need to round up if true.
char firstInSig = _digits[significantFigures];
if (firstInSig < '5')
{
// first non-significant digit less than five, round down.
addOne = false;
}
else if (firstInSig == '5')
{
// first non-significant digit equal to five
addOne = false;
for (int i = significantFigures + 1; !addOne && i < length; i++)
{
// if its followed by any non-zero digits, round up.
if (_digits[i] != '0')
{
addOne = true;
}
}
if (!addOne)
{
// if it was not followed by non-zero digits
// if the last significant digit is odd round up
// if the last significant digit is even round down
addOne = (_digits[significantFigures - 1] & 1) == 1;
}
}
else
{
// first non-significant digit greater than five, round up.
addOne = true;
}
// loop to add one (and carry a one if added to a nine)
// to the last significant digit
for (int i = significantFigures - 1; addOne && i >= 0; i--)
{
char digit = _digits[i];
if (digit < '9')
{
_digits[i] = (char) (digit + 1);
addOne = false;
}
else
{
_digits[i] = '0';
}
}
if (addOne)
{
// if the number was all nines
_digits.Insert(0, '1');
mantissa++;
}
// chop it to the correct number of figures.
_digits.Length = significantFigures;
}
}
return this;
}
public double ToDouble()
{
return Convert.ToDouble(original);
}
public static String Format(double number, int significantFigures)
{
SignificantFigures sf = new SignificantFigures(number);
sf.SetNumberSignificantFigures(significantFigures);
return sf.ToString();
}
}
I have a shorted answer to calculating significant figures of a number. Here is the code & the test results...
using System;
using System.Collections.Generic;
namespace ConsoleApplicationRound
{
class Program
{
static void Main(string[] args)
{
//char cDecimal = '.'; // for English cultures
char cDecimal = ','; // for German cultures
List<double> l_dValue = new List<double>();
ushort usSignificants = 5;
l_dValue.Add(0);
l_dValue.Add(0.000640589);
l_dValue.Add(-0.000640589);
l_dValue.Add(-123.405009);
l_dValue.Add(123.405009);
l_dValue.Add(-540);
l_dValue.Add(540);
l_dValue.Add(-540911);
l_dValue.Add(540911);
l_dValue.Add(-118.2);
l_dValue.Add(118.2);
l_dValue.Add(-118.18);
l_dValue.Add(118.18);
l_dValue.Add(-118.188);
l_dValue.Add(118.188);
foreach (double d in l_dValue)
{
Console.WriteLine("d = Maths.Round('" +
cDecimal + "', " + d + ", " + usSignificants +
") = " + Maths.Round(
cDecimal, d, usSignificants));
}
Console.Read();
}
}
}
The Maths class used is as follows:
using System;
using System.Text;
namespace ConsoleApplicationRound
{
class Maths
{
/// <summary>
/// The word "Window"
/// </summary>
private static String m_strZeros = "000000000000000000000000000000000";
/// <summary>
/// The minus sign
/// </summary>
public const char m_cDASH = '-';
/// <summary>
/// Determines the number of digits before the decimal point
/// </summary>
/// <param name="cDecimal">
/// Language-specific decimal separator
/// </param>
/// <param name="strValue">
/// Value to be scrutinised
/// </param>
/// <returns>
/// Nr. of digits before the decimal point
/// </returns>
private static ushort NrOfDigitsBeforeDecimal(char cDecimal, String strValue)
{
short sDecimalPosition = (short)strValue.IndexOf(cDecimal);
ushort usSignificantDigits = 0;
if (sDecimalPosition >= 0)
{
strValue = strValue.Substring(0, sDecimalPosition + 1);
}
for (ushort us = 0; us < strValue.Length; us++)
{
if (strValue[us] != m_cDASH) usSignificantDigits++;
if (strValue[us] == cDecimal)
{
usSignificantDigits--;
break;
}
}
return usSignificantDigits;
}
/// <summary>
/// Rounds to a fixed number of significant digits
/// </summary>
/// <param name="d">
/// Number to be rounded
/// </param>
/// <param name="usSignificants">
/// Requested significant digits
/// </param>
/// <returns>
/// The rounded number
/// </returns>
public static String Round(char cDecimal,
double d,
ushort usSignificants)
{
StringBuilder value = new StringBuilder(Convert.ToString(d));
short sDecimalPosition = (short)value.ToString().IndexOf(cDecimal);
ushort usAfterDecimal = 0;
ushort usDigitsBeforeDecimalPoint =
NrOfDigitsBeforeDecimal(cDecimal, value.ToString());
if (usDigitsBeforeDecimalPoint == 1)
{
usAfterDecimal = (d == 0)
? usSignificants
: (ushort)(value.Length - sDecimalPosition - 2);
}
else
{
if (usSignificants >= usDigitsBeforeDecimalPoint)
{
usAfterDecimal =
(ushort)(usSignificants - usDigitsBeforeDecimalPoint);
}
else
{
double dPower = Math.Pow(10,
usDigitsBeforeDecimalPoint - usSignificants);
d = dPower*(long)(d/dPower);
}
}
double dRounded = Math.Round(d, usAfterDecimal);
StringBuilder result = new StringBuilder();
result.Append(dRounded);
ushort usDigits = (ushort)result.ToString().Replace(
Convert.ToString(cDecimal), "").Replace(
Convert.ToString(m_cDASH), "").Length;
// Add lagging zeros, if necessary:
if (usDigits < usSignificants)
{
if (usAfterDecimal != 0)
{
if (result.ToString().IndexOf(cDecimal) == -1)
{
result.Append(cDecimal);
}
int i = (d == 0) ? 0 : Math.Min(0, usDigits - usSignificants);
result.Append(m_strZeros.Substring(0, usAfterDecimal + i));
}
}
return result.ToString();
}
}
}
Any answer with a shorter code?
You can get an elegant bit perfect rounding by using the GetBits method on Decimal and leveraging BigInteger to perform masking.
Some utils
public static int CountDigits
(BigInteger number) => ((int)BigInteger.Log10(number))+1;
private static readonly BigInteger[] BigPowers10
= Enumerable.Range(0, 100)
.Select(v => BigInteger.Pow(10, v))
.ToArray();
The main function
public static decimal RoundToSignificantDigits
(this decimal num,
short n)
{
var bits = decimal.GetBits(num);
var u0 = unchecked((uint)bits[0]);
var u1 = unchecked((uint)bits[1]);
var u2 = unchecked((uint)bits[2]);
var i = new BigInteger(u0)
+ (new BigInteger(u1) << 32)
+ (new BigInteger(u2) << 64);
var d = CountDigits(i);
var delta = d - n;
if (delta < 0)
return num;
var scale = BigPowers10[delta];
var div = i/scale;
var rem = i%scale;
var up = rem > scale/2;
if (up)
div += 1;
var shifted = div*scale;
bits[0] =unchecked((int)(uint) (shifted & BigUnitMask));
bits[1] =unchecked((int)(uint) (shifted>>32 & BigUnitMask));
bits[2] =unchecked((int)(uint) (shifted>>64 & BigUnitMask));
return new decimal(bits);
}
test case 0
public void RoundToSignificantDigits()
{
WMath.RoundToSignificantDigits(0.0012345m, 2).Should().Be(0.0012m);
WMath.RoundToSignificantDigits(0.0012645m, 2).Should().Be(0.0013m);
WMath.RoundToSignificantDigits(0.040000000000000008, 6).Should().Be(0.04);
WMath.RoundToSignificantDigits(0.040000010000000008, 6).Should().Be(0.04);
WMath.RoundToSignificantDigits(0.040000100000000008, 6).Should().Be(0.0400001);
WMath.RoundToSignificantDigits(0.040000110000000008, 6).Should().Be(0.0400001);
WMath.RoundToSignificantDigits(0.20000000000000004, 6).Should().Be(0.2);
WMath.RoundToSignificantDigits(0.10000000000000002, 6).Should().Be(0.1);
WMath.RoundToSignificantDigits(0.0, 6).Should().Be(0.0);
}
test case 1
public void RoundToSigFigShouldWork()
{
1.2m.RoundToSignificantDigits(1).Should().Be(1m);
0.01235668m.RoundToSignificantDigits(3).Should().Be(0.0124m);
0.01m.RoundToSignificantDigits(3).Should().Be(0.01m);
1.23456789123456789123456789m.RoundToSignificantDigits(4)
.Should().Be(1.235m);
1.23456789123456789123456789m.RoundToSignificantDigits(16)
.Should().Be(1.234567891234568m);
1.23456789123456789123456789m.RoundToSignificantDigits(24)
.Should().Be(1.23456789123456789123457m);
1.23456789123456789123456789m.RoundToSignificantDigits(27)
.Should().Be(1.23456789123456789123456789m);
}
I found this article doing a quick search on it. Basically this one converts to a string and goes by the characters in that array one at a time, till it reached the max. significance. Will this work?
The following code doesn't quite meet the spec, since it doesn't try to round anything to the left of the decimal point. But it's simpler than anything else presented here (so far). I was quite surprised that C# doesn't have a built-in method to handle this.
static public string SignificantDigits(double d, int digits=10)
{
int magnitude = (d == 0.0) ? 0 : (int)Math.Floor(Math.Log10(Math.Abs(d))) + 1;
digits -= magnitude;
if (digits < 0)
digits = 0;
string fmt = "f" + digits.ToString();
return d.ToString(fmt);
}
This method is dead simple and works with any number, positive or negative, and only uses a single transcendental function (Log10). The only difference (which may/may-not matter) is that it will not round the integer component. This is perfect however for currency processing where you know the limits are within certain bounds, because you can use doubles for much faster processing than the dreadfully slow Decimal type.
public static double ToDecimal( this double x, int significantFigures = 15 ) {
// determine # of digits before & after the decimal
int digitsBeforeDecimal = (int)x.Abs().Log10().Ceil().Max( 0 ),
digitsAfterDecimal = (significantFigures - digitsBeforeDecimal).Max( 0 );
// round it off
return x.Round( digitsAfterDecimal );
}
As I remember it "significant figures" means the number of digits after the dot separator so 3 significant digits for 0.012345 would be 0.012 and not 0.0123, but that really doesnt matter for the solution.
I also understand that you want to "nullify" the last digits to a certain degree if the number is > 1. You write that 12345 would become 12300 but im not sure whether you want 123456 to become 1230000 or 123400 ? My solution does the last. Instead of calculating the factor you could ofcourse make a small initialized array if you only have a couple of variations.
private static string FormatToSignificantFigures(decimal number, int amount)
{
if (number > 1)
{
int factor = Factor(amount);
return ((int)(number/factor)*factor).ToString();
}
NumberFormatInfo nfi = new CultureInfo("en-US", false).NumberFormat;
nfi.NumberDecimalDigits = amount;
return(number.ToString("F", nfi));
}
private static int Factor(int x)
{
return DoCalcFactor(10, x-1);
}
private static int DoCalcFactor(int x, int y)
{
if (y == 1) return x;
return 10*DoCalcFactor(x, y - 1);
}
Kind regards
Carsten

Categories