Is there a way to reverse each 2 bytes of a file? - c#

Basic summary of what I'm trying to achieve in this idea (As far as that I know there isn't exactly a function to do what I'm doing).
What I need to do is idea is to reverse every 2 bytes of a file. Reading a file bytes and reversing each 2 bytes.
Example: 05 04 82 FF
Output: 04 05 FF 82
I have some idea of it. But I know my attempts are WAY off.
To clarify.
I'm trying to take a bin file.
Read the bytes inside the file.
And reversing every 2 inside that file and close it.
If anyone can clear this complicated way up that would be great?

There are many approaches you could take to achieve this.
Here is a fairly efficient streaming approach with low allocations, using all the characters we know and love.... ArrayPool, Span<T>, and FileStream.
Note 1 : Adjust the buffer size to something that suits your hardware if needed.
Note 2 : This lacks basic sanity checks and fault tolerance, it will also die miserably if the size of the file isn't devisable by 2.
Given
private static ArrayPool<byte> pool = ArrayPool<byte>.Shared;
private const int BufferSize = 4096;
public static void Swap(string fileName)
{
var tempFileName = Path.ChangeExtension(fileName, "bob");
var buffer = pool.Rent(BufferSize);
try
{
var span = new Span<byte>(buffer);
using var oldFs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.None, BufferSize);
using var newFs = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.None, BufferSize);
var size = 0;
while ((size = oldFs.Read(span)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = span[i];
span[i] = span[i + 1];
span[i + 1] = temp;
}
newFs.Write(span.Slice(0,size));
}
}
finally
{
pool.Return(buffer);
}
File.Move(tempFileName, fileName,true);
}
Test
File.WriteAllText(#"D:\Test1.txt","1234567890abcdef");
Swap(#"D:\Test1.txt");
var result = File.ReadAllText(#"D:\Test1.txt");
Console.WriteLine(result == "2143658709badcfe");
Output
True
Benchmarks
This was just a simple benchmark comparing the current solution with a simple array approach and pointers, varying the buffer size which you might do to increase HDD throughput. Technically it's only benchmarking one run through of a 10Mb data block, However the allocations would sky rocket if the methods got run more than once.
Environment
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1198 (1909/November2018Update/19H2)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.100
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT [AttachedDebugger]
.NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
Results
| Method | N | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 4094 | 25.89 ns | 0.078 ns | 0.069 ns | - | - | - | - |
| SwapArray | 4094 | 157.70 ns | 0.516 ns | 0.483 ns | 0.4923 | - | - | 4120 B |
| SwapUnsafe | 4094 | 154.71 ns | 0.293 ns | 0.274 ns | 0.4923 | - | - | 4120 B |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 16384 | 25.82 ns | 0.048 ns | 0.043 ns | - | - | - | - |
| SwapArray | 16384 | 520.62 ns | 1.186 ns | 1.109 ns | 1.9569 | - | - | 16408 B |
| SwapUnsafe | 16384 | 518.82 ns | 1.361 ns | 1.273 ns | 1.9569 | - | - | 16408 B |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 65536 | 25.81 ns | 0.049 ns | 0.043 ns | - | - | - | - |
| SwapArray | 65536 | 1,840.41 ns | 5.792 ns | 5.418 ns | 7.8106 | - | - | 65560 B |
| SwapUnsafe | 65536 | 1,846.57 ns | 3.715 ns | 3.475 ns | 7.8106 | - | - | 65560 B |
Setup
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.NetCoreApp50)]
public class DumbTest
{
private static readonly ArrayPool<byte> pool = ArrayPool<byte>.Shared;
private MemoryStream ms1;
private MemoryStream ms2;
[Params(4094, 16384, 65536)] public int N;
[GlobalSetup]
public void Setup()
{
var data = new byte[10 * 1024 * 1024];
new Random(42).NextBytes(data);
ms1 = new MemoryStream(data);
ms2 = new MemoryStream(new byte[10 * 1024 * 1024]);
}
public void SpanPool()
{
var buffer = pool.Rent(N);
try
{
var span = new Span<byte>(buffer);
var size = 0;
while ((size = ms1.Read(span)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = span[i];
span[i] = span[i + 1];
span[i + 1] = temp;
}
ms2.Write(span.Slice(0, size));
}
}
finally
{
pool.Return(buffer);
}
}
public void Array()
{
var buffer = new byte[N];
var size = 0;
while ((size = ms1.Read(buffer)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = buffer[i];
buffer[i] = buffer[i + 1];
buffer[i + 1] = temp;
}
ms2.Write(buffer, 0, size);
}
}
public unsafe void Unsafe()
{
var buffer = new byte[N];
fixed (byte* p = buffer)
{
var size = 0;
while ((size = ms1.Read(buffer)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = buffer[i];
p[i] = p[i + 1];
p[i + 1] = temp;
}
ms2.Write(buffer, 0, size);
}
}
}
[Benchmark]
public void SwapSpanPool()
{
SpanPool();
}
[Benchmark]
public void SwapArray()
{
Array();
}
[Benchmark]
public void SwapUnsafe()
{
Unsafe();
}
}

Related

How to write to file with a pointer (byte*) to an array of byte

I am saving too many files large files for a mobile OS, so I am assuming I should be use byte* for saving memory and allocations
Example Code
private unsafe static void WriteMesh(ref MeshInfo mesh, BinaryWriter bw)
{
var size = UnsafeUtility.SizeOf<float3>() * mesh.vertices.Length;
byte* pByte = (byte*)mesh.vertices.GetUnsafePtr();
bw.Write(pByte); // Obviously this wont work
}
I know I can do this with Span<T>, but I am in Unity which doesn't support it. Is there another way?
Disregarding any other problems (conceptual or otherwise). There are a few ways to do this.
Convoluted examples ensue
If you could use Span<T> which can take a pointer and length and then use FileStream.Write(ReadOnlySpan<Byte>) overload
Writes a sequence of bytes from a read-only span to the current file
stream and advances the current position within this file stream by
the number of bytes written.
var bytes = new byte[] {1,2,3};
var size = bytes.Length;
using var fs = new FileStream(#"SomeAwesomeFileNamedBob.dat", FileMode.Create);
fixed (byte* p = bytes)
{
var span = new Span<byte>(p, size);
fs.Write(span);
}
Or, just use BinaryWriter.Write and write each byte, this is a̶ ̶l̶i̶t̶t̶l̶e̶ ... extremely inefficient
Writes a signed byte to the current stream and advances the stream
position by one byte.
var bytes = new byte[] {1, 2, 3};
var size = bytes.Length;
using var fs = new FileStream(#"SomeAwesomeFileNamedBob.dat", FileMode.Create);
using var bw = new BinaryWriter(fs);
fixed (byte* p = bytes)
for (int i = 0; i < size; i++)
bw.Write(*p);
Or, at the cost of an allocation, just Buffer.MemoryCopy to a new array and Write
Copies a block of memory.
var bytes = new byte[] {1,2,3};
var size = bytes.Length;
using var fs = new FileStream(#"SomeAwesomeFileNamedBob.dat", FileMode.Create);
var temp = new byte[size];
fixed (byte* pOld = bytes,pNew = temp)
{
Buffer.MemoryCopy(pOld,pNew,size,size);
fs.Write(temp,0,size);
}
Or, expanding on the array copy method, you could use an ArrayPool<Byte> for fewer allocations and in-turn will be better for your LOH (if applicable)
Provides a resource pool that enables reusing instances of type T[].
private static readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
...
var size = bytes.Length;
using var fs = new FileStream(#"SomeAwesomeFileNamedBob.dat", FileMode.Create);
var temp = _pool.Rent(size);
try
{
fixed (byte* pOld = bytes, pNew = temp)
{
Buffer.MemoryCopy(pOld, pNew, size, size);
fs.Write(temp, 0, size);
}
}
finally
{
_pool.Return(temp);
}
Or you could use an UnmanagedMemoryStream
Provides access to unmanaged blocks of memory from managed code.
Important
This API is not CLS-compliant.
var bytes = new byte[] {1,2,3};
var size = bytes.Length;
using var fs = new FileStream(#"SomeAwesomeFileNamedBob.dat", FileMode.Create);
fixed (byte* p = bytes)
{
using var us = new UnmanagedMemoryStream(p,size);
us.CopyTo(fs);
}
Benchmarks
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.630 (2004/?/20H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT [AttachedDebugger]
.NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
| Method | _size | Mean | Error | StdDev | Median |
|---------------- |------- |-------------:|-------------:|-------------:|-------------:|
| Span | 1000 | 122.4 ns | 2.46 ns | 2.42 ns | 122.7 ns |
| Single | 1000 | 5,548.3 ns | 82.61 ns | 73.23 ns | 5,561.8 ns |
| NewArray | 1000 | 230.4 ns | 4.64 ns | 10.56 ns | 227.4 ns |
| ArrayPool | 1000 | 185.6 ns | 3.74 ns | 4.60 ns | 186.1 ns |
| UnmanagedStream | 1000 | 249.8 ns | 4.89 ns | 8.69 ns | 247.5 ns |
|---------------- |------- |-------------:|-------------:|-------------:|-------------:|
| Span | 10000 | 1,012.9 ns | 20.06 ns | 44.87 ns | 1,007.0 ns |
| Single | 10000 | 56,143.2 ns | 980.01 ns | 1,436.48 ns | 56,087.6 ns |
| NewArray | 10000 | 2,086.1 ns | 43.89 ns | 127.34 ns | 2,048.9 ns |
| ArrayPool | 10000 | 1,277.2 ns | 24.38 ns | 50.88 ns | 1,272.3 ns |
| UnmanagedStream | 10000 | 1,267.8 ns | 24.52 ns | 28.24 ns | 1,260.9 ns |
|---------------- |------- |-------------:|-------------:|-------------:|-------------:|
| Span | 100000 | 56,843.0 ns | 1,107.92 ns | 1,137.75 ns | 56,587.5 ns |
| Single | 100000 | 601,186.9 ns | 11,991.48 ns | 17,576.95 ns | 598,002.9 ns |
| NewArray | 100000 | 111,234.1 ns | 1,296.51 ns | 1,012.23 ns | 111,268.3 ns |
| ArrayPool | 100000 | 59,183.1 ns | 278.01 ns | 232.15 ns | 59,141.8 ns |
| UnmanagedStream | 100000 | 58,539.6 ns | 941.79 ns | 834.87 ns | 58,176.1 ns |
Setup
[SimpleJob(RuntimeMoniker.NetCoreApp50)]
public unsafe class DumbTest
{
[Params(1000, 10000, 100000)] public int _size;
private byte* _p;
private GCHandle _handle;
private readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
[GlobalSetup]
public void Setup()
{
var bytes = new byte[_size];
new Random(42).NextBytes(bytes);
_handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
_p = (byte*) _handle.AddrOfPinnedObject();
}
[GlobalCleanup]
public void Cleanup() => _handle.Free();
[Benchmark]
public void Span()
{
using var ms = new MemoryStream();
var span = new Span<byte>(_p, _size);
ms.Write(span);
}
[Benchmark]
public void Single()
{
using var ms = new MemoryStream();
using var bw = new BinaryWriter(ms);
for (var i = 0; i < _size; i++)
bw.Write(*_p);
}
[Benchmark]
public void NewArray()
{
using var ms = new MemoryStream();
var temp = new byte[_size];
fixed (byte* pNew = temp)
{
Buffer.MemoryCopy(_p, pNew, _size, _size);
ms.Write(temp, 0, _size);
}
}
[Benchmark]
public void ArrayPool()
{
using var ms = new MemoryStream();
var temp = _pool.Rent(_size);
try
{
fixed (byte* pNew = temp)
{
Buffer.MemoryCopy(_p,pNew,_size,_size);
ms.Write(temp,0,_size);
}
}
finally
{
_pool.Return(temp);
}
}
[Benchmark]
public void UnmanagedStream()
{
using var ms = new MemoryStream();
using var us = new UnmanagedMemoryStream(_p, _size);
us.CopyTo(ms);
}
}

Why is BitArray faster than array of bools?

I have this implementation of Sieve of Eratosthenes in C#:
public static BitArray Count()
{
const int halfSize = MaxSize / 2;
var mark = new BitArray(halfSize);
const int max = halfSize - 2;
var maxFactor = (int) Math.Sqrt(MaxSize + 1) / 2;
for (var i = 1; i <= maxFactor; ++i)
{
if (mark[i]) continue;
var p = i + i + 1;
var k = p * p >> 1;
for (; k <= max; k += p)
{
mark[k] = true;
}
}
return mark;
}
It gives results good enough for me. Nonetheless, I decided to test this algorithm using arrays of bools, expecting it to use more memory but be faster. And to my surprise that wasn't the result. Benchmark.NET on .NET Core 3.1 shows that bool array is more than two times slower than BitArray. Considering that latter uses more method calls and gives much longer asm (BitArray vs. bool array), how is it possible?
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| Method | Mean | Error | StdDev | Median | Min | Max | Op/s | Allocated |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBool | 294.7 ms | 4.00 ms | 3.74 ms | 293.5 ms | 290.8 ms | 304.0 ms | 3.393 | 33.38 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBitArray | 130.2 ms | 1.03 ms | 0.97 ms | 130.3 ms | 128.5 ms | 132.1 ms | 7.680 | 4.17 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
Results are similar when using fields instead of initializing arrays in methods (except there is no allocation of course).

Ways to improve string memory allocation

This question is more theoretical than practical, but still.
I've been looking for a chance to improve the following code from the string memory allocation standpoint:
/* Output for n = 3:
*
* ' #'
* ' ##'
* '###'
*
*/
public static string[] staircase(int n) {
string[] result = new string[n];
for(var i = 0; i < result.Length; i++) {
var spaces = string.Empty.PadLeft(n - i - 1, ' ');
var sharpes = string.Empty.PadRight(i + 1, '#');
result[i] = spaces + sharpes;
}
return result;
}
PadHelper is the method, that is eventually called under the hood twice per iteration.
So, correct me if I'm wrong, but it seems like memory is allocated at least 3 times per iteration.
Any code improvements will be highly appreciated.
how about:
result[i] = new string('#',i).PadLeft(n)
?
Note that this still allocates two strings internally, but I honestly don't see that as a problem. The garbage collector will take care of it for you.
StringBuilder is always an answer when it comes to string allocations; I'm sure you know that so apparently you want something else. Well, since your strings are all the same length, you can declare a single char[] array, populate it every time (only requires changing one array element on each iteration) and then use the string(char[]) constructor:
public static string[] staircase(int n)
{
char[] buf = new char[n];
string[] result = new string[n];
for (var i = 0; i < n - 1; i++)
{
buf[i] = ' ';
}
for (var i = 0; i < n; i++)
{
buf[n - i - 1] = '#';
result[i] = new string(buf);
}
return result;
}
You can save on both allocations and speed by starting with a string that contains all the Spaces and all the Sharpes you're ever going to need, and then taking substrings from that, as follows:
public string[] Staircase2()
{
string allChars = new string(' ', n - 1) + new string('#', n); // n-1 spaces + n sharpes
string[] result = new string[n];
for (var i = 0; i < result.Length; i++)
result[i] = allChars.Substring(i, n);
return result;
}
I used BenchmarkDotNet to compare Staircase1 (your original approach) with Staircase2 (my approach above) from n=2 upto n=8, see the results below.
It shows that Staircase2 is always faster (see the Mean column), and it allocates fewer bytes starting from n=3.
| Method | n | Mean | Error | StdDev | Allocated |
|----------- |-- |------------:|-----------:|-----------:|----------:|
| Staircase1 | 2 | 229.36 ns | 4.3320 ns | 4.0522 ns | 92 B |
| Staircase2 | 2 | 92.00 ns | 0.7200 ns | 0.6735 ns | 116 B |
| Staircase1 | 3 | 375.06 ns | 3.3043 ns | 3.0908 ns | 156 B |
| Staircase2 | 3 | 114.12 ns | 2.8933 ns | 3.2159 ns | 148 B |
| Staircase1 | 4 | 507.32 ns | 3.8995 ns | 3.2562 ns | 236 B |
| Staircase2 | 4 | 142.78 ns | 1.4575 ns | 1.3634 ns | 196 B |
| Staircase1 | 5 | 650.03 ns | 15.1515 ns | 25.7284 ns | 312 B |
| Staircase2 | 5 | 169.25 ns | 1.9076 ns | 1.6911 ns | 232 B |
| Staircase1 | 6 | 785.75 ns | 16.9353 ns | 15.8413 ns | 412 B |
| Staircase2 | 6 | 195.91 ns | 2.9852 ns | 2.4928 ns | 292 B |
| Staircase1 | 7 | 919.15 ns | 11.4145 ns | 10.6771 ns | 500 B |
| Staircase2 | 7 | 237.55 ns | 4.6380 ns | 4.9627 ns | 332 B |
| Staircase1 | 8 | 1,075.66 ns | 26.7013 ns | 40.7756 ns | 620 B |
| Staircase2 | 8 | 255.50 ns | 2.6894 ns | 2.3841 ns | 404 B |
This doesn't mean that Staircase2 is the absolute best possible, but certainly there is a way that is better than the original.
You can project your desired results using the Linq Select method. For example, something like this:
public static string[] staircase(int n) {
return Enumerable.Range(1, n).Select(i => new string('#', i).PadLeft(n)).ToArray();
}
Alternate approach using an int array:
public static string[] staircase(int n) {
return (new int[n]).Select((x,i) => new string('#', i+1).PadLeft(n)).ToArray();
}
HTH

Slow Regex Split

I am parsing a large quantity of data (over 2GB), and my regex search is quite slow. Is there anyway to improve it?
Slow Code
string file_content = "4980: 01:06:59.140 - SomeLargeQuantityOfLogEntries";
List<string> split_content = Regex.Split(file_content, #"\s+(?=\d+: \d{2}:\d{2}:\d{2}\.\d{3} - )").ToList();
The way the program works is as follows:
Loads all the data into a string.
The above line of code is used to split the string into log entries and store each entry as an entry in a list. (This is the slow part that I would like to optimize)
Log entries are denoted by the Regex pattern shown above.
In the answer below I put couple of optimizations which you may use.
tl;dr; Speed up log parsing in 6 times by iterating the lines and use custom parsing method (not Regex)
Measurements
Before we attempt to make optimizations I'd propose to define how are we going to measure their impact and value.
For benchmarking I'll use Benchmark.NET framework. Create console application:
static void Main(string[] args)
{
BenchmarkRunner.Run<LogReaderBenchmarks>();
BenchmarkRunner.Run<LogParserBenchmarks>();
BenchmarkRunner.Run<LogBenchmarks>();
Console.ReadLine();
return;
}
Run below command in PackageManagerConsole to add nuget package:
Install-Package BenchmarkDotNet -Version 0.11.5
Test data generator looks like this, run it once, and then just use that temp file all over your benchmarks:
public static class LogFilesGenerator {
public static void GenerateLogFile(string location)
{
var sizeBytes = 512*1024*1024; // 512MB
var line = new StringBuilder();
using (var f = new StreamWriter(location))
{
for (long z = 0; z < sizeBytes; z += line.Length)
{
line.Clear();
line.Append($"{z}: {DateTime.UtcNow.TimeOfDay.ToString(#"hh\:mm\:ss\.fff")} - ");
for (var l = -1; l < z % 3; l++)
line.AppendLine(Guid.NewGuid().ToString());
f.WriteLine(line);
}
f.Close();
}
}
}
Reading file
And commentators pointed - that is very inefficient to read the whole file to memory, GC will be very unhappy, let's read it line-by-line.
The simplest way to achieve this is just using File.ReadLines() method which returns you non-materialized enumerable - you'll read the file while you are iterating over it.
You can also read file asynchronously as explained here. This is rather useless approach as I still merge everything to a single line, so I'm a bit speculating here when will comment on the results :)
| Method | buffer | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------- |------- |--------:|------------:|-----------:|----------:|----------:|
| ReadFileToMemory | ? | 1.919 s | 181000.0000 | 93000.0000 | 6000.0000 | 2.05 GB |
| ReadFileEnumerating | ? | 1.881 s | 314000.0000 | - | - | 1.38 GB |
| ReadFileToMemoryAsync | 4096 | 9.254 s | 248000.0000 | 68000.0000 | 6000.0000 | 1.92 GB |
| ReadFileToMemoryAsync | 16384 | 5.632 s | 215000.0000 | 61000.0000 | 6000.0000 | 1.72 GB |
| ReadFileToMemoryAsync | 65536 | 3.499 s | 196000.0000 | 54000.0000 | 4000.0000 | 1.62 GB |
[RyuJitX64Job]
[MemoryDiagnoser]
[IterationCount(1), InnerIterationCount(1), WarmupCount(0), InvocationCount(1), ProcessCount(1)]
[StopOnFirstError]
public class LogReaderBenchmarks
{
string file = #"C:\Users\Admin\AppData\Local\Temp\tmp6483.tmp";
[GlobalSetup()]
public void Setup()
{
//file = Path.GetTempFileName(); <---- uncomment these lines to generate file first time.
//Console.WriteLine(file);
//LogFilesGenerator.GenerateLogFile(file);
}
[Benchmark(Baseline = true)]
public string ReadFileToMemory() => File.ReadAllText(file);
[Benchmark]
[Arguments(1024*4)]
[Arguments(1024 * 16)]
[Arguments(1024 * 64)]
public async Task<string> ReadFileToMemoryAsync(int buffer) => await ReadTextAsync(file, buffer);
[Benchmark]
public int ReadFileEnumerating() => File.ReadLines(file).Select(l => l.Length).Max();
private async Task<string> ReadTextAsync(string filePath, int bufferSize)
{
using (FileStream sourceStream = new FileStream(filePath,
FileMode.Open, FileAccess.Read, FileShare.Read,
bufferSize: bufferSize, useAsync: true))
{
StringBuilder sb = new StringBuilder();
byte[] buffer = new byte[bufferSize];
int numRead;
while ((numRead = await sourceStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
string text = Encoding.Unicode.GetString(buffer, 0, numRead);
sb.Append(text);
}
return sb.ToString();
}
}
}
As you can see ReadFileEnumerating is the fastest. It allocates the same amount of memory as ReadFileToMemory but it is all in Gen 0, so GC can collect it faster, max memory consumption is much smaller than ReadFileToMemory.
Async read does not give any performance gain. If you need throughput, don't use it.
Split log entries
Regex is slow and memory hungry. Passing a huge string will make your application work slow. You can mitigate this problem and check each line of the file if it matches your Regex. You need to reconstruct the whole log entry though if it could be multiline.
Also you can introduce more efficient method that matches your string, check customParseMatch for example. I don't pretend it to be the most efficient, you may write a separate benchmark for predicate, but it already shows a good result comparing to Regex - it is 10 times faster.
| Method | Mean | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|------:|------------:|------------:|----------:|----------:|
| SplitByRegex | 24.191 s | 1.00 | 426000.0000 | 119000.0000 | 4000.0000 | 2.65 GB |
| SplitByRegexIterating | 16.302 s | 0.67 | 176000.0000 | 88000.0000 | 1000.0000 | 2.05 GB |
| SplitByCustomParseIterating | 2.385 s | 0.10 | 398000.0000 | - | - | 1.75 GB |
[RyuJitX64Job]
[MemoryDiagnoser]
[IterationCount(1), InnerIterationCount(1), WarmupCount(0), InvocationCount(1), ProcessCount(1)]
[StopOnFirstError]
public class LogParserBenchmarks
{
string file = #"C:\Users\Admin\AppData\Local\Temp\tmp6483.tmp";
string[] lines;
string text;
Regex split_regex = new Regex(#"\s+(?=\d+: \d{2}:\d{2}:\d{2}\.\d{3} - )");
[GlobalSetup()]
public void Setup()
{
lines = File.ReadAllLines(file);
text = File.ReadAllText(file);
}
[Benchmark(Baseline = true)]
public string[] SplitByRegex() => split_regex.Split(text);
[Benchmark]
public int SplitByRegexIterating() =>
parseLogEntries(lines, split_regex.IsMatch).Count();
[Benchmark]
public int SplitByCustomParseIterating() =>
parseLogEntries(lines, customParseMatch).Count();
public static bool customParseMatch(string line)
{
var refinedLine = line.TrimStart();
var colonIndex = refinedLine.IndexOf(':');
if (colonIndex < 0) return false;
if (!int.TryParse(refinedLine.Substring(0,colonIndex), out var _)) return false;
if (refinedLine[colonIndex + 1] != ' ') return false;
if (!TimeSpan.TryParseExact(refinedLine.Substring(colonIndex + 2,12), #"hh\:mm\:ss\.fff", CultureInfo.InvariantCulture, out var _)) return false;
return true;
}
IEnumerable<string> parseLogEntries(IEnumerable<string> lines, Predicate<string> entryMatched)
{
StringBuilder builder = new StringBuilder();
foreach (var line in lines)
{
if (entryMatched(line) && builder.Length > 0)
{
yield return builder.ToString();
builder.Clear();
}
builder.AppendLine(line);
}
if (builder.Length > 0)
yield return builder.ToString();
}
}
Parallelism
If your log entries could be multi-line that is not trivial task and I'd leave it to other members to provide a code.
Summary
So iterating over each line and using a custom parse function gives us the best results so far. Let's make a benchmark and check how much did we gain:
| Method | Mean | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|------------:|------------:|----------:|----------:|
| ReadTextAndSplitByRegex | 29.070 s | 601000.0000 | 198000.0000 | 2000.0000 | 4.7 GB |
| ReadLinesAndSplitByFunction | 4.117 s | 713000.0000 | - | - | 3.13 GB |
[RyuJitX64Job]
[MemoryDiagnoser]
[IterationCount(1), InnerIterationCount(1), WarmupCount(0), InvocationCount(1), ProcessCount(1)]
[StopOnFirstError]
public class LogBenchmarks
{
[Benchmark(Baseline = true)]
public string[] ReadTextAndSplitByRegex()
{
var text = File.ReadAllText(LogParserBenchmarks.file);
return LogParserBenchmarks.split_regex.Split(text);
}
[Benchmark]
public int ReadLinesAndSplitByFunction()
{
var lines = File.ReadLines(LogParserBenchmarks.file);
var entries = LogParserBenchmarks.parseLogEntries(lines, LogParserBenchmarks.customParseMatch);
return entries.Count();
}
}
I'm not going to try to improve on Fenixil's excellent and thorough answer. I would like to point out that while regular expressions are great for some things, as is already apparent they aren't particularly efficient. Below is how the regex you've given is resolved (according to the RegEx Buddy tool).
It takes a bit of work to match a regex. This link How a Regex Engine Works Internally explains the process further.

Convert a float-formated char[] to float

I have a char[] salary which contains data that comes from a string. I want to convert char[] salary to float, but it seems to be extremelly slow by the method I'm trying, which is:
float ff = float.Parse(new string(salary));
According to Visual Studio's Performance Profiler this is taking way too much processing:
So I'd like to know if there's a faster way to do this, Since performance here is a point.
The char[] is formated like so:
[ '1', '3', '2', ',', '2', '9']
And is basically a JSON-like float converted to every digit (and comma) fit into a char[].
EDIT:
I've reformatted the code and it seems like the performance hit is actually in the conversion from char[] to string, not the parsing from string to float.
Since this question has changed from "What's the fastest way to parse a float?" to "What's the fastest way to get a string from a char[]?", I wrote some benchmarks with BenchmarkDotNet to compare the various methods. My finding is that, if you already have a char[], you can't get any faster than just passing it to the string(char[]) constructor like you're already doing.
You say that your input file is "read into a byte[], then the section of the byte[] that represents the float is extracted into a char[]." Since you have the bytes that make up the float text isolated in a byte[], perhaps you can improve performance by skipping the intermediate char[]. Assuming you have something equivalent to...
byte[] floatBytes = new byte[] { 0x31, 0x33, 0x32, 0x2C, 0x32, 0x39 }; // "132,29"
...you could use Encoding.GetString()...
string floatString = Encoding.ASCII.GetString(floatBytes);
...which is nearly twice as fast as passing the result of Encoding.GetChars() to the string(char[]) constructor...
char[] floatChars = Encoding.ASCII.GetChars(floatBytes);
string floatString = new string(floatChars);
You'll find those benchmarks listed last in my results...
BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
[Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
Clr : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
Core : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
Method | Runtime | Categories | Mean | Scaled |
----------------------------------------------------- |-------- |----------------- |----------:|-------:|
String_Constructor_CharArray | Clr | char[] => string | 13.51 ns | 1.00 |
String_Concat | Clr | char[] => string | 192.87 ns | 14.27 |
StringBuilder_Local_AppendSingleChar_DefaultCapacity | Clr | char[] => string | 60.74 ns | 4.49 |
StringBuilder_Local_AppendSingleChar_ExactCapacity | Clr | char[] => string | 60.26 ns | 4.46 |
StringBuilder_Local_AppendAllChars_DefaultCapacity | Clr | char[] => string | 51.27 ns | 3.79 |
StringBuilder_Local_AppendAllChars_ExactCapacity | Clr | char[] => string | 49.51 ns | 3.66 |
StringBuilder_Field_AppendSingleChar | Clr | char[] => string | 51.14 ns | 3.78 |
StringBuilder_Field_AppendAllChars | Clr | char[] => string | 32.95 ns | 2.44 |
| | | | |
String_Constructor_CharPointer | Clr | void* => string | 29.28 ns | 1.00 |
String_Constructor_SBytePointer | Clr | void* => string | 89.21 ns | 3.05 |
UnsafeArrayCopy_String_Constructor | Clr | void* => string | 42.82 ns | 1.46 |
| | | | |
Encoding_GetString | Clr | byte[] => string | 37.33 ns | 1.00 |
Encoding_GetChars_String_Constructor | Clr | byte[] => string | 60.83 ns | 1.63 |
SafeArrayCopy_String_Constructor | Clr | byte[] => string | 27.55 ns | 0.74 |
| | | | |
String_Constructor_CharArray | Core | char[] => string | 13.27 ns | 1.00 |
String_Concat | Core | char[] => string | 172.17 ns | 12.97 |
StringBuilder_Local_AppendSingleChar_DefaultCapacity | Core | char[] => string | 58.68 ns | 4.42 |
StringBuilder_Local_AppendSingleChar_ExactCapacity | Core | char[] => string | 59.85 ns | 4.51 |
StringBuilder_Local_AppendAllChars_DefaultCapacity | Core | char[] => string | 40.62 ns | 3.06 |
StringBuilder_Local_AppendAllChars_ExactCapacity | Core | char[] => string | 43.67 ns | 3.29 |
StringBuilder_Field_AppendSingleChar | Core | char[] => string | 54.49 ns | 4.11 |
StringBuilder_Field_AppendAllChars | Core | char[] => string | 31.05 ns | 2.34 |
| | | | |
String_Constructor_CharPointer | Core | void* => string | 22.87 ns | 1.00 |
String_Constructor_SBytePointer | Core | void* => string | 83.11 ns | 3.63 |
UnsafeArrayCopy_String_Constructor | Core | void* => string | 35.30 ns | 1.54 |
| | | | |
Encoding_GetString | Core | byte[] => string | 36.19 ns | 1.00 |
Encoding_GetChars_String_Constructor | Core | byte[] => string | 58.99 ns | 1.63 |
SafeArrayCopy_String_Constructor | Core | byte[] => string | 27.81 ns | 0.77 |
...from running this code (requires BenchmarkDotNet assembly and compiling with /unsafe)...
using System;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;
namespace StackOverflow_51584129
{
[CategoriesColumn()]
[ClrJob()]
[CoreJob()]
[GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
public class StringCreationBenchmarks
{
private static readonly Encoding InputEncoding = Encoding.ASCII;
private const string InputString = "132,29";
private static readonly byte[] InputBytes = InputEncoding.GetBytes(InputString);
private static readonly char[] InputChars = InputString.ToCharArray();
private static readonly sbyte[] InputSBytes = InputBytes.Select(Convert.ToSByte).ToArray();
private GCHandle _inputBytesHandle;
private GCHandle _inputCharsHandle;
private GCHandle _inputSBytesHandle;
private StringBuilder _builder;
[Benchmark(Baseline = true)]
[BenchmarkCategory("char[] => string")]
public string String_Constructor_CharArray()
{
return new string(InputChars);
}
[Benchmark(Baseline = true)]
[BenchmarkCategory("void* => string")]
public unsafe string String_Constructor_CharPointer()
{
var pointer = (char*) _inputCharsHandle.AddrOfPinnedObject();
return new string(pointer);
}
[Benchmark()]
[BenchmarkCategory("void* => string")]
public unsafe string String_Constructor_SBytePointer()
{
var pointer = (sbyte*) _inputSBytesHandle.AddrOfPinnedObject();
return new string(pointer);
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string String_Concat()
{
return string.Concat(InputChars);
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Local_AppendSingleChar_DefaultCapacity()
{
var builder = new StringBuilder();
foreach (var c in InputChars)
builder.Append(c);
return builder.ToString();
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Local_AppendSingleChar_ExactCapacity()
{
var builder = new StringBuilder(InputChars.Length);
foreach (var c in InputChars)
builder.Append(c);
return builder.ToString();
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Local_AppendAllChars_DefaultCapacity()
{
var builder = new StringBuilder().Append(InputChars);
return builder.ToString();
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Local_AppendAllChars_ExactCapacity()
{
var builder = new StringBuilder(InputChars.Length).Append(InputChars);
return builder.ToString();
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Field_AppendSingleChar()
{
_builder.Clear();
foreach (var c in InputChars)
_builder.Append(c);
return _builder.ToString();
}
[Benchmark()]
[BenchmarkCategory("char[] => string")]
public string StringBuilder_Field_AppendAllChars()
{
return _builder.Clear().Append(InputChars).ToString();
}
[Benchmark(Baseline = true)]
[BenchmarkCategory("byte[] => string")]
public string Encoding_GetString()
{
return InputEncoding.GetString(InputBytes);
}
[Benchmark()]
[BenchmarkCategory("byte[] => string")]
public string Encoding_GetChars_String_Constructor()
{
var chars = InputEncoding.GetChars(InputBytes);
return new string(chars);
}
[Benchmark()]
[BenchmarkCategory("byte[] => string")]
public string SafeArrayCopy_String_Constructor()
{
var chars = new char[InputString.Length];
for (int i = 0; i < InputString.Length; i++)
chars[i] = Convert.ToChar(InputBytes[i]);
return new string(chars);
}
[Benchmark()]
[BenchmarkCategory("void* => string")]
public unsafe string UnsafeArrayCopy_String_Constructor()
{
fixed (char* chars = new char[InputString.Length])
{
var bytes = (byte*) _inputBytesHandle.AddrOfPinnedObject();
for (int i = 0; i < InputString.Length; i++)
chars[i] = Convert.ToChar(bytes[i]);
return new string(chars);
}
}
[GlobalSetup(Targets = new[] { nameof(StringBuilder_Field_AppendAllChars), nameof(StringBuilder_Field_AppendSingleChar) })]
public void SetupStringBuilderField()
{
_builder = new StringBuilder();
}
[GlobalSetup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
public void SetupBytesHandle()
{
_inputBytesHandle = GCHandle.Alloc(InputBytes, GCHandleType.Pinned);
}
[GlobalCleanup(Target = nameof(UnsafeArrayCopy_String_Constructor))]
public void CleanupBytesHandle()
{
_inputBytesHandle.Free();
}
[GlobalSetup(Target = nameof(String_Constructor_CharPointer))]
public void SetupCharsHandle()
{
_inputCharsHandle = GCHandle.Alloc(InputChars, GCHandleType.Pinned);
}
[GlobalCleanup(Target = nameof(String_Constructor_CharPointer))]
public void CleanupCharsHandle()
{
_inputCharsHandle.Free();
}
[GlobalSetup(Target = nameof(String_Constructor_SBytePointer))]
public void SetupSByteHandle()
{
_inputSBytesHandle = GCHandle.Alloc(InputSBytes, GCHandleType.Pinned);
}
[GlobalCleanup(Target = nameof(String_Constructor_SBytePointer))]
public void CleanupSByteHandle()
{
_inputSBytesHandle.Free();
}
public static void Main(string[] args)
{
BenchmarkDotNet.Running.BenchmarkRunner.Run<StringCreationBenchmarks>();
}
}
}
On the float-parsing side of things, there are some gains to be had based on which overload of float.Parse() you call and what you pass to it. I ran some more benchmarks comparing these overloads (note that I changed the decimal separator character from ',' to '.' just so I could specify CultureInfo.InvariantCulture).
For example, calling an overload that takes an IFormatProvider is good for about a 10% performance increase. Specifying NumberStyles.Float ("lax") for the NumberStyles parameter effects a change in performance of about a percentage point in either direction, and, making some assumptions about our input data, specifying only NumberStyles.AllowDecimalPoint ("strict") nets a few points performance increase. (The float.Parse(string) overload uses NumberStyles.Float | NumberStyles.AllowThousands.)
On the subject of making assumptions about your input data, if you know the text you're working with has certain characteristics (single-byte character encoding, no invalid numbers, no negatives, no exponents, no need to handle NaN or positive/negative infinity, etc.) you might do well to parse from the bytes directly and forego any unneeded special case handling and error checking. I included a very simple implementation in my benchmarks and it was able to get a float from a byte[] more than 16x faster than float.Parse(string) could get a float from a string!
Here are my benchmark results...
BenchmarkDotNet=v0.11.0, OS=Windows 10.0.17134.165 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Max: 2.79GHz) (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2732436 Hz, Resolution=365.9738 ns, Timer=TSC
.NET Core SDK=2.1.202
[Host] : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
Clr : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3131.0
Core : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
Method | Runtime | Mean | Scaled |
-------------------------------------------------------------- |-------- |-----------:|-------:|
float.Parse(string) | Clr | 145.098 ns | 1.00 |
'float.Parse(string, IFormatProvider)' | Clr | 134.191 ns | 0.92 |
'float.Parse(string, NumberStyles) [Lax]' | Clr | 145.884 ns | 1.01 |
'float.Parse(string, NumberStyles) [Strict]' | Clr | 139.417 ns | 0.96 |
'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' | Clr | 133.800 ns | 0.92 |
'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' | Clr | 127.413 ns | 0.88 |
'Custom byte-to-float parser [Indexer]' | Clr | 7.657 ns | 0.05 |
'Custom byte-to-float parser [Enumerator]' | Clr | 566.440 ns | 3.90 |
| | | |
float.Parse(string) | Core | 154.369 ns | 1.00 |
'float.Parse(string, IFormatProvider)' | Core | 138.668 ns | 0.90 |
'float.Parse(string, NumberStyles) [Lax]' | Core | 155.644 ns | 1.01 |
'float.Parse(string, NumberStyles) [Strict]' | Core | 150.221 ns | 0.97 |
'float.Parse(string, NumberStyles, IFormatProvider) [Lax]' | Core | 142.591 ns | 0.92 |
'float.Parse(string, NumberStyles, IFormatProvider) [Strict]' | Core | 135.000 ns | 0.87 |
'Custom byte-to-float parser [Indexer]' | Core | 12.673 ns | 0.08 |
'Custom byte-to-float parser [Enumerator]' | Core | 584.236 ns | 3.78 |
...from running this code (requires BenchmarkDotNet assembly)...
using System;
using System.Globalization;
using BenchmarkDotNet.Attributes;
namespace StackOverflow_51584129
{
[ClrJob()]
[CoreJob()]
public class FloatParsingBenchmarks
{
private const string InputString = "132.29";
private static readonly byte[] InputBytes = System.Text.Encoding.ASCII.GetBytes(InputString);
private static readonly IFormatProvider ParsingFormatProvider = CultureInfo.InvariantCulture;
private const NumberStyles LaxParsingNumberStyles = NumberStyles.Float;
private const NumberStyles StrictParsingNumberStyles = NumberStyles.AllowDecimalPoint;
private const char DecimalSeparator = '.';
[Benchmark(Baseline = true, Description = "float.Parse(string)")]
public float SystemFloatParse()
{
return float.Parse(InputString);
}
[Benchmark(Description = "float.Parse(string, IFormatProvider)")]
public float SystemFloatParseWithProvider()
{
return float.Parse(InputString, CultureInfo.InvariantCulture);
}
[Benchmark(Description = "float.Parse(string, NumberStyles) [Lax]")]
public float SystemFloatParseWithLaxNumberStyles()
{
return float.Parse(InputString, LaxParsingNumberStyles);
}
[Benchmark(Description = "float.Parse(string, NumberStyles) [Strict]")]
public float SystemFloatParseWithStrictNumberStyles()
{
return float.Parse(InputString, StrictParsingNumberStyles);
}
[Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Lax]")]
public float SystemFloatParseWithLaxNumberStylesAndProvider()
{
return float.Parse(InputString, LaxParsingNumberStyles, ParsingFormatProvider);
}
[Benchmark(Description = "float.Parse(string, NumberStyles, IFormatProvider) [Strict]")]
public float SystemFloatParseWithStrictNumberStylesAndProvider()
{
return float.Parse(InputString, StrictParsingNumberStyles, ParsingFormatProvider);
}
[Benchmark(Description = "Custom byte-to-float parser [Indexer]")]
public float CustomFloatParseByIndexing()
{
// FOR DEMONSTRATION PURPOSES ONLY!
// This code has been written for and only tested with
// parsing the ASCII string "132.29" in byte form
var currentIndex = 0;
var boundaryIndex = InputBytes.Length;
char currentChar;
var wholePart = 0;
while (currentIndex < boundaryIndex && (currentChar = (char) InputBytes[currentIndex++]) != DecimalSeparator)
{
var currentDigit = currentChar - '0';
wholePart = 10 * wholePart + currentDigit;
}
var fractionalPart = 0F;
var nextFractionalDigitScale = 0.1F;
while (currentIndex < boundaryIndex)
{
currentChar = (char) InputBytes[currentIndex++];
var currentDigit = currentChar - '0';
fractionalPart += currentDigit * nextFractionalDigitScale;
nextFractionalDigitScale *= 0.1F;
}
return wholePart + fractionalPart;
}
[Benchmark(Description = "Custom byte-to-float parser [Enumerator]")]
public float CustomFloatParseByEnumerating()
{
// FOR DEMONSTRATION PURPOSES ONLY!
// This code has been written for and only tested with
// parsing the ASCII string "132.29" in byte form
var wholePart = 0;
var enumerator = InputBytes.GetEnumerator();
while (enumerator.MoveNext())
{
var currentChar = (char) (byte) enumerator.Current;
if (currentChar == DecimalSeparator)
break;
var currentDigit = currentChar - '0';
wholePart = 10 * wholePart + currentDigit;
}
var fractionalPart = 0F;
var nextFractionalDigitScale = 0.1F;
while (enumerator.MoveNext())
{
var currentChar = (char) (byte) enumerator.Current;
var currentDigit = currentChar - '0';
fractionalPart += currentDigit * nextFractionalDigitScale;
nextFractionalDigitScale *= 0.1F;
}
return wholePart + fractionalPart;
}
public static void Main()
{
BenchmarkDotNet.Running.BenchmarkRunner.Run<FloatParsingBenchmarks>();
}
}
}
After some experiments and the tests from this:
The fastest way to have string from char[] is using new string
One more attention FYI, following this article of Microsoft in the case of invalid input, TryParse is the fastest way to parse float. So, think about it..
TryParse is only taking .5% of execution time Parse is taking 18% while Convert is taking 14%
Interesting topic for working out optimization details at home :) good health to you all..
My goal was: convert an Ascii CSV matrix into a float matrix as fast as possible in C#. For this purpose, it turns out string.Split() rows and converting each term separately will also introduce overhead. To overcome this, I modified BACON's solution for row parsing my floats, to use it like:
var falist = new List<float[]>();
for (int row=0; row<sRowList.Count; row++)
{
var sRow = sRowList[row];
falist.Add(CustomFloatParseRowByIndexing(nTerms, sRow.ToCharArray(), '.'));
}
Code for my row parser variant is below. These are benchmark results, converting a 40x31 matrix 1000x:
Benchmark0: Split row and Parse each term to convert to float matrix dT=704 ms
Benchmark1: Split row and TryParse each term to convert to float matrix dT=640 ms
Benchmark2: Split row and CustomFloatParseByIndexing to convert terms to float matrix dT=211 ms
Benchmark3: Use CustomFloatParseRowByIndexing to convert rows to float matrix dT=120 ms
public float[] CustomFloatParseRowByIndexing(int nItems, char[] InputBytes, char DecimalSeparator)
{
// Convert semicolon-separated floats from InputBytes into nItems float[] result.
// Constraints are:
// - no scientific notation or .x allowed
// - every row has exactly nItems values
// - semicolon delimiter after each value
// - terms 'u' or 'undef' or 'undefined' allowed for bad values
// - minus sign allowed
// - leading space allowed
// - all terms must comply
// FOR DEMO PURPOSE ONLY
// based on BACON on Stackoverflow, modified to read nItems delimited float values
// https://stackoverflow.com/questions/51584129/convert-a-float-formated-char-to-float
var currentIndex = 0;
var boundaryIndex = InputBytes.Length;
bool termready, ready = false;
float[] result = new float[nItems];
int cItem = 0;
while (currentIndex < boundaryIndex)
{
termready = false;
if ((char)InputBytes[currentIndex] == ' ') { currentIndex++; continue; }
char currentChar;
var wholePart = 0;
float sgn = 1;
while (currentIndex < boundaryIndex && (currentChar = (char)InputBytes[currentIndex++]) != DecimalSeparator)
{
if (currentChar == 'u')
{
while ((char)InputBytes[currentIndex++] != ';') ;
result[cItem++] = -9999.0f;
continue;
}
else
if (currentChar == ' ')
{
continue;
}
else
if (currentChar == ';')
{
termready = true;
break;
}
else
if (currentChar == '-') sgn = -1;
else
{
var currentDigit = currentChar - '0';
wholePart = 10 * wholePart + currentDigit;
}
}
var fractionalPart = 0F;
var nextFractionalDigitScale = 0.1F;
if (!termready)
while (currentIndex < boundaryIndex)
{
currentChar = (char)InputBytes[currentIndex++];
if (currentChar == ';')
{
termready = true;
break;
}
var currentDigit = currentChar - '0';
fractionalPart += currentDigit * nextFractionalDigitScale;
nextFractionalDigitScale *= 0.1F;
}
if (termready)
{
result[cItem++] = sgn * (wholePart + fractionalPart);
}
}
return result;
}

Categories