VB6: How are binary files encoded? using Put statement

VB6: How are binary files encoded? using Put statement - c#

I have this code
Open WritingPath & "\FplDb.txt" For Random As #1 Len = Len(WpRec)
For i = 1 To 99
WpRec.WpIndex = FplDB(i, 1)
WpRec.WpName = FplDB(i, 2)
WpRec.WpLat = FplDB(i, 3)
WpRec.WpLon = FplDB(i, 4)
WpRec.WpLatDir = FplDB(i, 5)
WpRec.WpLonDir = FplDB(i, 6)
Put #1, i, WpRec
Next i
Close #1
SaveOk = 1
FplSave = SaveOk
Exit Function
This function makes binary serialization of a matrix of 99 structs (WpRec) to file, using "Open" and "Put" statements. But I didn't get how it is encoded... It is important to me because I need to rewrite the same serialization in C# but I need to know what encoding method is used for that so I can do the same in C#....

The tricky bit in VB6 was that you were allowed to declare structures with fixed length strings so that you could write records containing strings that didn't need a length prefix. The length of the string buffer was encoded into the type instead of needing to be written out with the record. This allowed for fixed size records. In .NET, this has kind of been left behind in the sense that VB.NET has a mechanism to support it for backward compatibility, but it's not really intended for C# as far as I can tell: How to declare a fixed-length string in VB.NET?.
.NET seems to have a preference for generally writing out strings with a length prefix, meaning that records are generally variable-length. This is suggested by the implementation of BinaryReader.ReadString.
However, you can use System.BitConverter to get finer control over how records are serialized and de-serialized as bytes (System.IO.BinaryReader and System.IO.BinaryWriter are probably not useful since they make assumptions that strings have a length prefix). Keep in mind that a VB6 Integer maps to a .NET Int16 and a VB6 Long is a .Net Int32. I don't know exactly how you have defined your VB6 structure, but here's one possible implementation as an example:
class Program
{
static void Main(string[] args)
{
WpRecType[] WpRec = new WpRecType[3];
WpRec[0] = new WpRecType();
WpRec[0].WpIndex = 0;
WpRec[0].WpName = "New York";
WpRec[0].WpLat = 40.783f;
WpRec[0].WpLon = 73.967f;
WpRec[0].WpLatDir = 1;
WpRec[0].WpLonDir = 1;
WpRec[1] = new WpRecType();
WpRec[1].WpIndex = 1;
WpRec[1].WpName = "Minneapolis";
WpRec[1].WpLat = 44.983f;
WpRec[1].WpLon = 93.233f;
WpRec[1].WpLatDir = 1;
WpRec[1].WpLonDir = 1;
WpRec[2] = new WpRecType();
WpRec[2].WpIndex = 2;
WpRec[2].WpName = "Moscow";
WpRec[2].WpLat = 55.75f;
WpRec[2].WpLon = 37.6f;
WpRec[2].WpLatDir = 1;
WpRec[2].WpLonDir = 2;
byte[] buffer = new byte[WpRecType.RecordSize];
using (System.IO.FileStream stm =
new System.IO.FileStream(#"C:\Users\Public\Documents\FplDb.dat",
System.IO.FileMode.OpenOrCreate, System.IO.FileAccess.ReadWrite))
{
WpRec[0].SerializeInto(buffer);
stm.Write(buffer, 0, buffer.Length);
WpRec[1].SerializeInto(buffer);
stm.Write(buffer, 0, buffer.Length);
WpRec[2].SerializeInto(buffer);
stm.Write(buffer, 0, buffer.Length);
// Seek to record #1, load and display it
stm.Seek(WpRecType.RecordSize * 1, System.IO.SeekOrigin.Begin);
stm.Read(buffer, 0, WpRecType.RecordSize);
WpRecType rec = new WpRecType(buffer);
Console.WriteLine("[{0}] {1}: {2} {3}, {4} {5}", rec.WpIndex, rec.WpName,
rec.WpLat, (rec.WpLatDir == 1) ? "N" : "S",
rec.WpLon, (rec.WpLonDir == 1) ? "W" : "E");
}
}
}
class WpRecType
{
public short WpIndex;
public string WpName;
public Single WpLat;
public Single WpLon;
public byte WpLatDir;
public byte WpLonDir;
const int WpNameBytes = 40; // 20 unicode characters
public const int RecordSize = WpNameBytes + 12;
public void SerializeInto(byte[] target)
{
int position = 0;
target.Initialize();
BitConverter.GetBytes(WpIndex).CopyTo(target, position);
position += 2;
System.Text.Encoding.Unicode.GetBytes(WpName).CopyTo(target, position);
position += WpNameBytes;
BitConverter.GetBytes(WpLat).CopyTo(target, position);
position += 4;
BitConverter.GetBytes(WpLon).CopyTo(target, position);
position += 4;
target[position++] = WpLatDir;
target[position++] = WpLonDir;
}
public void Deserialize(byte[] source)
{
int position = 0;
WpIndex = BitConverter.ToInt16(source, position);
position += 2;
WpName = System.Text.Encoding.Unicode.GetString(source, position, WpNameBytes);
position += WpNameBytes;
WpLat = BitConverter.ToSingle(source, position);
position += 4;
WpLon = BitConverter.ToSingle(source, position);
position += 4;
WpLatDir = source[position++];
WpLonDir = source[position++];
}
public WpRecType()
{
}
public WpRecType(byte[] source)
{
Deserialize(source);
}
}

Add a reference to Microsoft.VisualBasic and use FilePut
It is designed to assist with compatibility with VB6
The VB6 code in your question would be something like this in C# (I haven't compiled this)
Microsoft.VisualBasic.FileOpen (1, WritingPath & "\FplDb.txt", OpenMode.Random,
RecordLength:=Marshal.SizeOf(WpRec))
for (i = 1; i < 100 ; i++) {
WpRec.WpIndex = FplDB(i, 1)
WpRec.WpName = FplDB(i, 2)
WpRec.WpLat = FplDB(i, 3)
WpRec.WpLon = FplDB(i, 4)
WpRec.WpLatDir = FplDB(i, 5)
WpRec.WpLonDir = FplDB(i, 6)
Microsoft.VisualBasic.FilePut(1, WpRec, i)
}
Microsoft.VisualBasic.FileClose(1)
I think Marshal.SizeOf(WpRec) returns the same value that Len(WpRec) will return in VB6 - do check this though.

The put statement in VB6 does not do any encoding. It saves a structure just as it is stored internally in memory. For example, put saves a double as a 64-bit floating point value, just as it is represented in memory. In your example, the members of WpRec are stored in the put statement just as WpRec is stored in memory.

Related

Algorithms and techniques for string search across multiple GiB of text files

I have to create a utility that searches through 40 to 60 GiB of text files as quick as possible.
Each file has around 50 MB of data that consists of log lines (about 630.000 lines per file).
A NOSQL document database is unfortunately no option...
As of now I am using a Aho-Corsaick algorithm for the search which I stole from Tomas Petricek off of his blog. It works very well.
I process the files in Tasks. Each file is loaded into memory by simply calling File.ReadAllLines(path). The lines are then fed into the Aho-Corsaick one by one, thus each file causes around 600.000 calls to the algorithm (I need the line number in my results).
This takes a lot of time and requires a lot of memory and CPU.
I have very little expertise in this field as I usually work in image processing.
Can you guys recommend algorithms and approaches which could speed up the processing?
Below is more detailed view to the Task creation and file loading which is pretty standard. For more information on the Aho-Corsaick, please visit the linked blog page above.
private KeyValuePair<string, StringSearchResult[]> FindInternal(
IStringSearchAlgorithm algo,
string file)
{
List<StringSearchResult> result = new List<StringSearchResult>();
string[] lines = File.ReadAllLines(file);
for (int i = 0; i < lines.Length; i++)
{
var results = algo.FindAll(lines[i]);
for (int j = 0; j < results.Length; j++)
{
results[j].Row = i;
}
}
foreach (string line in lines)
{
result.AddRange(algo.FindAll(line));
}
return new KeyValuePair<string, StringSearchResult[]>(
file, result.ToArray());
}
public Dictionary<string, StringSearchResult[]> Find(
params string[] search)
{
IStringSearchAlgorithm algo = new StringSearch();
algo.Keywords = search;
Task<KeyValuePair<string, StringSearchResult[]>>[] findTasks
= new Task<KeyValuePair<string, StringSearchResult[]>>[_files.Count];
Parallel.For(0, _files.Count, i => {
findTasks[i] = Task.Factory.StartNew(
() => FindInternal(algo, _files[i])
);
});
Task.WaitAll(findTasks);
return findTasks.Select(t => t.Result)
.ToDictionary(x => x.Key, x => x.Value);
}

EDIT
See section Initial Answer for the original Answer.
I further optimized my code by doing the following:
Added paging to prevent memory overflow / crash due to large amount of result data.
I offload the search results into local files as soon as they exceed a certain buffer size (64kb in my case).
Offloading the results required me to convert my SearchData struct to binary and back.
Splicing the array of files which are processed and running them in Tasks greatly increased performance (from 35 sec to 9 sec when processing about 25 GiB of search data)
Splicing / scaling the file array
The code below gives a scaled/normalized value for T_min and T_max.
This value can then be used to determine the size of each array holding n-amount of file paths.
private int ScalePartition(int T_min, int T_max)
{
// Scale m to range.
int m = T_max / 2;
int t_min = 4;
int t_max = Math.Max(T_max / 16, T_min);
m = ((T_min - m) / (T_max - T_min)) * (t_max - t_min) + t_max;
return m;
}
This code shows the implementation of the scaling and splicing.
// Get size of file array portion.
int scale = ScalePartition(1, _files.Count);
// Iterator.
int n = 0;
// List containing tasks.
List<Task<SearchData[]>> searchTasks = new List<Task<SearchData[]>>();
// Loop through files.
while (n < _files.Count) {
// Local instance of n.
// You will get an AggregateException if you use n
// as n changes during runtime.
int num = n;
// The amount of items to take.
// This needs to be calculated as there might be an
// odd number of elements in the file array.
int cnt = n + scale > _files.Count ? _files.Count - n : scale;
// Run the Find(int, int, Regex[]) method and add as task.
searchTasks.Add(Task.Run(() => Find(num, cnt, regexes)));
// Increment iterator by the amount of files stored in scale.
n += scale;
}
Initial Answer
I had the best results so far after switching to MemoryMappedFile and moving from the Aho-Corsaick back to Regex (a demand has been made that pattern matching is a must have).
There are still parts that can be optimized or changed and I'm sure this is not the fastest or best solution but for it's alright.
Here is the code which returns the results in 30 seconds for 25 GiB worth of data:
// GNU coreutil wc defined buffer size.
// Had best performance with this buffer size.
//
// Definition in wc.c:
// -------------------
// /* Size of atomic reads. */
// #define BUFFER_SIZE (16 * 1024)
//
private const int BUFFER_SIZE = 16 * 1024;
private KeyValuePair<string, SearchData[]> FindInternal(Regex[] rgx, string file)
{
// Buffer for data segmentation.
byte[] buffer = new byte[BUFFER_SIZE];
// Get size of file.
FileInfo fInfo = new FileInfo(file);
long fSize = fInfo.Length;
fInfo = null;
// List of results.
List<SearchData> results = new List<SearchData>();
// Create MemoryMappedFile.
string name = "mmf_" + Path.GetFileNameWithoutExtension(file);
using (var mmf = MemoryMappedFile.CreateFromFile(
file, FileMode.Open, name))
{
// Create read-only in-memory access to file data.
using (var accessor = mmf.CreateViewStream(
0, fSize,
MemoryMappedFileAccess.Read))
{
// Store current position.
int pos = (int)accessor.Position;
// Check if file size is less then the
// default buffer size.
int cnt = (int)(fSize - BUFFER_SIZE > 0
? BUFFER_SIZE
: fSize - BUFFER_SIZE);
// Iterate through file until end of file is reached.
while (accessor.Position < fSize)
{
// Write data to buffer.
accessor.Read(buffer, 0, cnt);
// Update position.
pos = (int)accessor.Position;
// Update next buffer size.
cnt = (int)(fSize - pos >= BUFFER_SIZE
? BUFFER_SIZE
: fSize - pos);
// Convert buffer data to string for Regex search.
string s = Encoding.UTF8.GetString(buffer);
// Run regex against extracted data.
foreach (Regex r in rgx) {
// Get matches.
MatchCollection matches = r.Matches(s);
// Create SearchData struct to reduce memory
// impact and only keep relevant data.
foreach (Match m in matches) {
SearchData sd = new SearchData();
// The actual matched string.
sd.Match = m.Value;
// The index in the file.
sd.Index = m.Index + pos;
// Index to find beginning of line.
int nFirst = m.Index;
// Index to find end of line.
int nLast = m.Index;
// Go back in line until the end of the
// preceeding line has been found.
while (s[nFirst] != '\n' && nFirst > 0) {
nFirst--;
}
// Append length of \r\n (new line).
// Change this to 1 if you work on Unix system.
nFirst+=2;
// Go forth in line until the end of the
// current line has been found.
while (s[nLast] != '\n' && nLast < s.Length-1) {
nLast++;
}
// Remove length of \r\n (new line).
// Change this to 1 if you work on Unix system.
nLast-=2;
// Store whole line in SearchData struct.
sd.Line = s.Substring(nFirst, nLast - nFirst);
// Add result.
results.Add(sd);
}
}
}
}
}
return new KeyValuePair<string, SearchData[]>(file, results.ToArray());
}
public List<KeyValuePair<string, SearchData[]>> Find(params string[] search)
{
var results = new List<KeyValuePair<string, SearchData[]>>();
// Prepare regex objects.
Regex[] regexes = new Regex[search.Length];
for (int i=0; i<regexes.Length; i++) {
regexes[i] = new Regex(search[i], RegexOptions.Compiled);
}
// Get all search results.
// Creating the Regex once and passing it
// to the sub-routine is best as the regex
// engine adds a lot of overhead.
foreach (var file in _files) {
var data = FindInternal(regexes, file);
results.Add(data);
}
return results;
}
I had a stupid idea yesterday were I though that it might work out converting the file data to a bitmap and looking for the input within pixels as pixel checking is quite fast.
Just for the giggles... here is the non-optimized test code for that stupid idea:
public struct SearchData
{
public string Line;
public string Search;
public int Row;
public SearchData(string l, string s, int r) {
Line = l;
Search = s;
Row = r;
}
}
internal static class FileToImage
{
public static unsafe SearchData[] FindText(string search, Bitmap bmp)
{
byte[] buffer = Encoding.ASCII.GetBytes(search);
BitmapData data = bmp.LockBits(
new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadOnly, bmp.PixelFormat);
List<SearchData> results = new List<SearchData>();
int bpp = Bitmap.GetPixelFormatSize(bmp.PixelFormat) / 8;
byte* ptFirst = (byte*)data.Scan0;
byte firstHit = buffer[0];
bool isFound = false;
for (int y=0; y<data.Height; y++) {
byte* ptStride = ptFirst + (y * data.Stride);
for (int x=0; x<data.Stride; x++) {
if (firstHit == ptStride[x]) {
byte[] temp = new byte[buffer.Length];
if (buffer.Length < data.Stride-x) {
int ret = 0;
for (int n=0, xx=x; n<buffer.Length; n++, xx++) {
if (ptStride[xx] != buffer[n]) {
break;
}
ret++;
}
if (ret == buffer.Length) {
int lineLength = 0;
for (int n = 0; n<data.Stride; n+=bpp) {
if (ptStride[n+2] == 255 &&
ptStride[n+1] == 255 &&
ptStride[n+0] == 255)
{
lineLength=n;
}
}
SearchData sd = new SearchData();
byte[] lineBytes = new byte[lineLength];
Marshal.Copy((IntPtr)ptStride, lineBytes, 0, lineLength);
sd.Search = search;
sd.Line = Encoding.ASCII.GetString(lineBytes);
sd.Row = y;
results.Add(sd);
}
}
}
}
}
return results.ToArray();
bmp.UnlockBits(data);
return null;
}
private static unsafe Bitmap GetBitmapInternal(string[] lines, int startIndex, Bitmap bmp)
{
int bpp = Bitmap.GetPixelFormatSize(bmp.PixelFormat) / 8;
BitmapData data = bmp.LockBits(
new Rectangle(0, 0, bmp.Width, bmp.Height),
ImageLockMode.ReadWrite,
bmp.PixelFormat);
int index = startIndex;
byte* ptFirst = (byte*)data.Scan0;
int maxHeight = bmp.Height;
if (lines.Length - startIndex < maxHeight) {
maxHeight = lines.Length - startIndex -1;
}
for (int y = 0; y < maxHeight; y++) {
byte* ptStride = ptFirst + (y * data.Stride);
index++;
int max = lines[index].Length;
max += (max % bpp);
lines[index] += new string('\0', max % bpp);
max = lines[index].Length;
for (int x=0; x+2<max; x+=bpp) {
ptStride[x+0] = (byte)lines[index][x+0];
ptStride[x+1] = (byte)lines[index][x+1];
ptStride[x+2] = (byte)lines[index][x+2];
}
ptStride[max+2] = 255;
ptStride[max+1] = 255;
ptStride[max+0] = 255;
for (int x = max + bpp; x < data.Stride; x += bpp) {
ptStride[x+2] = 0;
ptStride[x+1] = 0;
ptStride[x+0] = 0;
}
}
bmp.UnlockBits(data);
return bmp;
}
public static unsafe Bitmap[] GetBitmap(string filePath)
{
int bpp = Bitmap.GetPixelFormatSize(PixelFormat.Format24bppRgb) / 8;
var lines = System.IO.File.ReadAllLines(filePath);
int y = 0x800; //lines.Length / 0x800;
int x = lines.Max(l => l.Length) / bpp;
int cnt = (int)Math.Ceiling((float)lines.Length / (float)y);
Bitmap[] results = new Bitmap[cnt];
for (int i = 0; i < results.Length; i++) {
results[i] = new Bitmap(x, y, PixelFormat.Format24bppRgb);
results[i] = GetBitmapInternal(lines, i * 0x800, results[i]);
}
return results;
}
}

You can split the file into partitions and regex search each partition in parallel then join the results. There are some sharp edges in the details like handling values that span two partitions. Gigantor is a c# library I have created that does this very thing. Feel free to try it or have a look at the source code.

Radix Sort in File C#

This is an assignment from university. I have to do a Radix sort on car registration plates (ABC 123) in two ways 1) array 2) linked list. The most interesting thing is that sorting MUST BE done in the file. For example, from now on we will talk only about array. I generate car numbers and put them in array, then with binary write I write all generated car reg plates to the file. After that I give the newly generated file to Radix Sort and he need to do the magic. I will show you the code that I have at the moment, but it's not actually a 'real' radix sort, because my mind cannot understand how would I implement radix sort in file. ( I have implemented radix sort for normal array and linked list, but when it is done INSIDE a file it is mind blowing). I just wanted to ask if any of you would have any tips or ideas on how I could improve the sorting algorithm, because it is hella slow. Thank you.
PROGRAM.CS
public static void CountingSort(DataArray items, int exp)
{
UTF8Encoding encoder = new UTF8Encoding();
Byte[] forChange = new byte[16];
double first, second;
int i, j;
NumberPlate plate1;
NumberPlate plate2;
for (int z = 0; z < items.Length; z++)
{
i = 0;
j = 1;
while (j < items.Length)
{
BitConverter.GetBytes(items[i]).CopyTo(forChange, 0);
BitConverter.GetBytes(items[j]).CopyTo(forChange, 8);
string firstPlate = encoder.GetString(forChange, 1, 7);
string[] partsFirst = firstPlate.Split(' ');
plate1 = new NumberPlate(partsFirst[0], partsFirst[1]);
string secondPlate = encoder.GetString(forChange, 9, 7);
string[] partsSecond = secondPlate.Split(' ');
plate2 = new NumberPlate(partsSecond[0], partsSecond[1]);
first = plate1.GetPlateCode() / exp % 10;
second = plate2.GetPlateCode() / exp % 10;
if (first > second)
{
items.Swap(j, BitConverter.ToDouble(forChange, 0), BitConverter.ToDouble(forChange, 8));
}
i++;
j++;
}
}
}
public static void Radix_Sort(DataArray items)
{
for (int exp = 1; exp < Math.Pow(10, 9); exp *= 10)
{
CountingSort(items, exp);
}
}
public static void Test_File_Array_List(int seed)
{
int n = 5;
string filename;
filename = #"mydataarray.txt";
//filename = #"mydataarray.dat";
MyFileArray myfilearray = new MyFileArray(filename, n);
using (myfilearray.fs = new FileStream(filename, FileMode.Open, FileAccess.ReadWrite))
{
Console.WriteLine("\n FILE ARRAY \n");
myfilearray.Print(n);
Radix_Sort(myfilearray);
myfilearray.Print(n);
}
}
ARRAY.CS
public override double this[int index]
{
get
{
Byte[] data = new Byte[8];
fs.Seek(8 * index, SeekOrigin.Begin);
fs.Read(data, 0, 8);
double result = BitConverter.ToDouble(data, 0);
return result;
}
}
public override void Swap(int j, double a, double b)
{
Byte[] data = new Byte[16];
BitConverter.GetBytes(b).CopyTo(data, 0);
BitConverter.GetBytes(a).CopyTo(data, 8);
fs.Seek(8 * (j - 1), SeekOrigin.Begin);
fs.Write(data, 0, 16);
}

If the assignment mentions array and linked list, then it would seem that the file is only used to read the data into the array or linked list, then then sort is done, and the result is written to a file.
For a file based radix sort, for each digit (right to left), 10 temp files are created, the data is read and appended to file "digit" based on the digit, then the 10 temp files are closed, then concatenated into a single working file for the next radix sort step. For each letter, 26 temp files would be used.

How to write an int into System.Span, i.e the revers of int.Parse(span)?

I need to write and read big CSV (comma separated value) files, which basically contain integer values converted to strings. For reading such files efficiently, .Net Core has introduced a new Parse method for the type int:
public static int Parse (ReadOnlySpan<char> s,
System.Globalization.NumberStyles style =
System.Globalization.NumberStyles.Integer, IFormatProvider provider = null);
This allows to use a StreamReader writing the characters of the file into a character array. My program has then to find the positions of the separation characters, creating a ReadOnlySpan containing the characters between 2 separators and then converting them into an int, without creating first a string out of these characters. Since my files contain millions of values, avoiding creating millions of strings should result in faster file reading. I hope.
But how about writing the int values as strings to the file ? Traditionally, it would be done like this:
var int1 = 1;
var int2 = 2;
streamWriter.WriteLine(int1.ToString() + "," + int2.ToString());
Again, for each int a string gets created and then another string for each line. This will create millions of strings that need to be garbage collected.
I would prefer something like that:
char[] charArray = getEmptyCharArray();
var span = new Span<char>(charArray);
int length1 = span.Write(int1);
charArray[length1] = ',';
span = span.Slice(length1 + 1);
int length2 = span.Write(int2);
streamWriter.Write(charArray, 0, length1 + 1 + length2);
getEmptyCharArray() provides a character array which gets reused.
Unfortunately, Span has no Write() function :-(
So the question is: How can I write an int (or DateTime or Decimal or ...) into a Span without generating any garbage collected objects (strings) ?
Note that any answer given before 2018 is probably not what is needed here, because System.Span got only introduced in .NET Core 2.1. Also note that the question here is about System.Span and not the HTML Span or any other Span.

Thanks to the comment from Ian Kent, I asked on https://gitter.im/dotnet/corefx and they knew the answer. It's embarrassingly simple:
var i = 1;
Span<char> span = new char[100];
var ok = i.TryFormat(span, out var charsWritten);
Since I didn't find this answer for some days and I wanted to move on with my code, I wrote my own method, but using char[] instead of Span. I measured with BenchmarkRunner the speed of the different methods to write a 50 megabyte CSV file with 7'000'000 ints:
60 ms: Writing the same constant string. This gives a base line how long DotNet needs just to write the file
for (int i = 0; i < iterations; i++) {
streamWriter.WriteLine("1;12;123;1234;12345;123456;1234567;12345678;123;");
}
610 ms: Using ToString()
for (int i = 0; i < iterations; i++) {
streamWriter.WriteLine($"{i};{i+1};{i+2};{i+3};{i+4};{i+5};{i+6};");
}
308 ms: Using TryFormat(Span)
185 ms: Using my own method and char[]
It's amazing that the string conversations take 10 times longer than writing the actual file. I would have expected that the harddisk is much slower than any software.
We are told that Span will solve many performance problems. Not by much. It seems it would have been better if they would use char[].
Span test code
public void WriteTo4() {
var PathFileName = directoryInfo.FullName + #"\Test1.csv";
using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
using (var streamWriter = new StreamWriter(fileStream)) {
var lineBuffer = new char[100];
Span<char> span = lineBuffer;
for (int i = 0; i < iterations; i++) {
var ok = i.TryFormat(span, out var charsWritten);
lineBuffer[charsWritten++] = ';';
var span1 = span[charsWritten..];
ok = (i+1).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+2).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+3).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+4).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+5).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
span1 = span1[charsWritten..];
ok = (i+6).TryFormat(span1, out charsWritten);
span1[charsWritten++] = ';';
var ca = lineBuffer[..(lineBuffer.Length - span1.Length + charsWritten)];
streamWriter.WriteLine(lineBuffer, 0, lineBuffer.Length - span1.Length + charsWritten);
}
}
}
}
Test code using char[]
public void WriteTo3() {
var PathFileName = directoryInfo.FullName + #"\Test1.csv";
using (var fileStream = new FileStream(PathFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, bufferSize, FileOptions.SequentialScan)) {
using (var streamWriter = new StreamWriter(fileStream)) {
var lineBuffer = new char[100];
for (int i = 0; i < iterations; i++) {
var index = 0;
lineBuffer.Write3(i, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+1, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+2, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+3, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+4, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+5, ref index);
lineBuffer[index++] = ';';
lineBuffer.Write3(i+6, ref index);
lineBuffer[index++] = ';';
streamWriter.WriteLine(lineBuffer, 0, index);
}
}
}
}
public static void Write3(this char[] charArray, int i, ref int index) {
if (i<0) {
charArray[index++] = '-';
i = -i;
}
int start = index;
while (i>9) {
charArray[index++] = (char)((i % 10) + '0');
i /= 10;
}
charArray[index++] = (char)(i + '0');
var end = index-1;
while (end>start) {
var temp = charArray[end];
charArray[end--] = charArray[start];
charArray[start++] = temp;
}
}

How about you try parsing the int directly to a char array via passing through all digits , converting them to char-s and storing them directly to the destination.
public static ReadOnlySpan<char> ToSpan(int src)
{
int len = GetLength(src);
Span<char> chars = new char[len];
for (int i = 0; i < chars.Length; i++)
{
chars[i]= (char)((Math.Floor(src / Math.Pow(10, (chars.Length - i - 1))) % 10) + 48);
}
return chars;
static int GetLength(int src)
{
int len = 0;
while (src > 0)
{
src = src / 10;
len++;
}
return len;
}
}
static void Main(string[] args)
{
int original = 3334;
var data = ToSpan(original);
var copy= int.Parse(data);
Console.WriteLine(copy);
}
P.S
The bad part is that you need to iterate first on the int in order to get the length of the destination.
You can surely do some optimizations regarding the way i convert from a digit to a char ,and maybe also the way i segregate the digits.

Match a sequence of bits in a number and then convert the match into zeroes?

My assignment is to search through the binary representation of a number and replace a matched pattern of another binary representation of a number. If I get a match, I convert the matching bits from the first integer into zeroes and move on.
For example the number 469 would be 111010101 and I have to match it with 5 (101). Here's the program I've written so far. Doesn't work as expected.
using System;
namespace Conductors
{
class Program
{
static void Main(string[] args)
{
//this is the number I'm searching for a match in
int binaryTicket = 469;
//This is the pattern I'm trying to match (101)
int binaryPerforator = 5;
string binaryTicket01 = Convert.ToString(binaryTicket, 2);
bool match = true;
//in a 32 bit integer, position 29 is the last one I would
//search in, since I'm searching for the next 3
for (int pos = 0; pos < 29; pos++)
{
for (int j = 0; j <= 3; j++)
{
var posInBinaryTicket = pos + j;
var posInPerforator = j;
int bitInBinaryTicket = (binaryTicket & (1 << posInBinaryTicket)) >> posInBinaryTicket;
int bitInPerforator = (binaryPerforator & (1 << posInPerforator)) >> posInPerforator;
if (bitInBinaryTicket != bitInPerforator)
{
match = false;
break;
}
else
{
//what would be the proper bitwise operator here?
bitInBinaryTicket = 0;
}
}
Console.WriteLine(binaryTicket01);
}
}
}
}

Few things:
Use uint for this. Makes things a hell of a lot easier when dealing with binary numbers.
You aren't really setting anything - you're simply storing information, which is why you're printing out the same number so often.
You should loop the x times where x = length of the binary string (not just 29). There's no need for inner loops
static void Main(string[] args)
{
//this is the number I'm searching for a match in
uint binaryTicket = 469;
//This is the pattern I'm trying to match (101)
uint binaryPerforator = 5;
var numBinaryDigits = Math.Ceiling(Math.Log(binaryTicket, 2));
for (var i = 0; i < numBinaryDigits; i++)
{
var perforatorShifted = binaryPerforator << i;
//We need to mask off the result (otherwise we fail for checking 101 -> 111)
//The mask will put 1s in each place the perforator is checking.
var perforDigits = (int)Math.Ceiling(Math.Log(perforatorShifted, 2));
uint mask = (uint)Math.Pow(2, perforDigits) - 1;
Console.WriteLine("Ticket:\t" + GetBinary(binaryTicket));
Console.WriteLine("Perfor:\t" + GetBinary(perforatorShifted));
Console.WriteLine("Mask :\t" + GetBinary(mask));
if ((binaryTicket & mask) == perforatorShifted)
{
Console.WriteLine("Match.");
//Imagine we have the case:
//Ticket:
//111010101
//Perforator:
//000000101
//Is a match. What binary operation can we do to 0-out the final 101?
//We need to AND it with
//111111010
//To get that value, we need to invert the perforatorShifted
//000000101
//XOR
//111111111
//EQUALS
//111111010
//Which would yield:
//111010101
//AND
//111110000
//Equals
//111010000
var flipped = perforatorShifted ^ ((uint)0xFFFFFFFF);
binaryTicket = binaryTicket & flipped;
}
}
string binaryTicket01 = Convert.ToString(binaryTicket, 2);
Console.WriteLine(binaryTicket01);
}
static string GetBinary(uint v)
{
return Convert.ToString(v, 2).PadLeft(32, '0');
}
Please read over the above code - if there's anything you don't understand, leave me a comment and I can run through it with you.

Editing a line in a text file without using pointers?

I am trying to edit a line of a text file (.Hex file) containing all Hex characters without using pointers and in a more efficient way.
It takes so long because the program I have to edit some (around 30x4 bytes or 30 float values from the address values of hex file).
Every time the program replaces one byte, it searches the complete file and replaces the values, and copy back back again the new file to another file. This process repeats 30 times, which is quite time consuming and hence not looks appropriate.
What would be the most efficient method?
public static string putbyteinhexfile(int address, char data, string total)
{
int temph, temphl, tempht;
ushort checksum = 0;
string output = null, hexa = null;
StreamReader hex;
RegistryKey reg = Registry.CurrentUser;
reg = reg.OpenSubKey("Software\\Calibratortest");
hex = new StreamReader(((string)reg.GetValue("Select Input Hex File")));
StreamReader map = new StreamReader((string)reg.GetValue("Select Linker Map File"));
while ((output = hex.ReadLine()) != null)
{
checksum = 0;
temph = Convert.ToInt16(("0x" + output.Substring(3, 4)), 16);
temphl = Convert.ToInt16(("0x" + output.Substring(1, 2)), 16);
tempht = Convert.ToInt16(("0x" + output.Substring(7, 2)), 16);
if (address >= temph &&
address < temph + temphl &&
tempht == 0)
{
output = output.Remove((address - temph) * 2 + 9, 2);
output = output.Insert((address - temph) * 2 + 9,
String.Format("{0:X2}", Convert.ToInt16(data)));
for (int i = 1; i < (output.Length - 1) / 2; i++)
checksum += (ushort)Convert.ToUInt16(output.Substring((i * 2) - 1, 2), 16);
hexa = ((~checksum + 1).ToString("x8")).ToUpper();
output = output.Remove(temphl * 2 + 9, 2);
output = output.Insert(temphl * 2 + 9,
hexa.Substring(hexa.Length - 2, 2));
break;
}
else total = total + output + '\r' + '\n';
}
hex.Close();
map.Close();
return total;
}

Assuming you don't want to massively rewrite your existing logic which does 'for each line, do this search and replace logic', I'd think the simplest change would be:
var lines = File.ReadAllLines(filePath);
foreach (change to make)
{
for (int i = 0; i < lines.Length; i++)
{
// read values from line
if (need_to_modify)
{
// whatever change logic you want here.
lines[i] = lines[i].Replace(...);
}
}
}
File.WriteAllLines(filePath, lines);
Basically, you'll still do the logic you have now, except:
You read the file once instead of N times
you get rid of streamreader / streamwriter work
you do your changes on the array of strings in memory

string fileName = "blabla.hex";
StreamReader f1 = File.OpenText(fileName);
StreamWriter f2 = File.CreateText(fileName + ".temp_");
while (!f1.EndOfStream)
{
String s = f1.ReadLine();
//change the content of the variable 's' as you wish
f2.WriteLine(s);
}
f1.Close();
f2.Close();
File.Replace(fileName + ".temp_", fileName, null);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

VB6: How are binary files encoded? using Put statement - c#

Related

Algorithms and techniques for string search across multiple GiB of text files

Radix Sort in File C#

How to write an int into System.Span, i.e the revers of int.Parse(span)?

Match a sequence of bits in a number and then convert the match into zeroes?

Editing a line in a text file without using pointers?

Categories

Resources