Reading string array from a HDF5 dataset

Reading string array from a HDF5 dataset - c#

I am trying to read a string dataset from a HDF5 file in C# into a array of strings. I was able to read into the dataset using the following code:
//read the no of rows and columns
var datasetID = H5D.open(fileId,"dimensions");
var dataTypeId = H5D.getType(datasetID);
var dataType = H5T.getClass(dataTypeId);
var length = H5T.getSize(dataTypeId);
int[] dDim = new int[length];
H5D.read(datasetID, dataTypeId, new H5Array<int>(dDim));
I tried to do the same for string dataset but I get all the values initialized to null. So I referred this link (https://www.mail-archive.com/hdf-forum#hdfgroup.org/msg02980.html). I was able to read them as bytes, but I don't know the size the byte array should be initialized to. The code i have right now to read string is this:
//read string
datasetID = H5D.open(fileId, "names");
var dataSpaceId = H5D.getSpace(datasetID);
long[] dims = H5S.getSimpleExtentDims(dataSpaceId);
dataTypeId = H5T.copy(H5T.H5Type.C_S1);
//hard coding the no of string to read (213)
byte[] buffer = new byte[dims[0]*213];
Console.WriteLine(dims[0]);
H5D.read(datasetID, dataTypeId, new H5Array<byte>(buffer));
Console.WriteLine(System.Text.ASCIIEncoding.ASCII.GetString(buffer)); `.

If you do not know in advance what your data type will be, try the following code. It is incomplete for data types but it is easily modifiable:
public static Array Read1DArray(this H5FileId fileId, string dataSetName)
{
var dataset = H5D.open(fileId, dataSetName);
var space = H5D.getSpace(dataset);
var dims = H5S.getSimpleExtentDims(space);
var dtype = H5D.getType(dataset);
var size = H5T.getSize(dtype);
var classID = H5T.getClass(dtype);
var rank = H5S.getSimpleExtentNDims(space);
var status = H5S.getSimpleExtentDims(space);
// Read data into byte array
var dataArray = new Byte[status[0]*size];
var wrapArray = new H5Array<Byte>(dataArray);
H5D.read(dataset, dtype, wrapArray);
// Convert types
Array returnArray = null;
Type dataType = null;
switch (classID)
{
case H5T.H5TClass.STRING:
dataType = typeof(string);
break;
case H5T.H5TClass.FLOAT:
if (size == 4)
dataType = typeof(float);
else if (size == 8)
dataType = typeof(double);
break;
case H5T.H5TClass.INTEGER:
if (size == 2)
dataType = typeof(Int16);
else if (size == 4)
dataType = typeof(Int32);
else if (size == 8)
dataType = typeof(Int64);
break;
}
if (dataType == typeof (string))
{
var cSet = H5T.get_cset(dtype);
string[] stringArray = new String[status[0]];
for (int i = 0; i < status[0]; i++)
{
byte[] buffer = new byte[size];
Array.Copy(dataArray, i*size, buffer, 0, size);
Encoding enc = null;
switch (cSet)
{
case H5T.CharSet.ASCII:
enc = new ASCIIEncoding();
break;
case H5T.CharSet.UTF8:
enc = new UTF8Encoding();
break;
case H5T.CharSet.ERROR:
break;
}
stringArray[i] = enc.GetString(buffer).TrimEnd('\0');
}
returnArray = stringArray;
}
else
{
returnArray = Array.CreateInstance(dataType, status[0]);
Buffer.BlockCopy(dataArray, 0, returnArray, 0, (int) status[0]*size);
}
H5S.close(space);
H5T.close(dtype);
H5D.close(dataset);
return returnArray;
}

your start was exceptionally helpful! With it and some help from HDF5 Example code, I was able to come up with some generic extensions, that would reduce your code to:
//read string
string[] datasetValue = fileId.Read1DArray<string>("names");
The extensions look something like this (which is, or should be, exactly the same as from the referenced question.):
public static class HdfExtensions
{
// thank you https://stackoverflow.com/questions/4133377/splitting-a-string-number-every-nth-character-number
public static IEnumerable<String> SplitInParts(this String s, Int32 partLength)
{
if (s == null)
throw new ArgumentNullException("s");
if (partLength <= 0)
throw new ArgumentException("Part length has to be positive.", "partLength");
for (var i = 0; i < s.Length; i += partLength)
yield return s.Substring(i, Math.Min(partLength, s.Length - i));
}
public static T[] Read1DArray<T>(this H5FileId fileId, string dataSetName)
{
var dataset = H5D.open(fileId, dataSetName);
var space = H5D.getSpace(dataset);
var dims = H5S.getSimpleExtentDims(space);
var dataType = H5D.getType(dataset);
if (typeof(T) == typeof(string))
{
int stringLength = H5T.getSize(dataType);
byte[] buffer = new byte[dims[0] * stringLength];
H5D.read(dataset, dataType, new H5Array<byte>(buffer));
string stuff = System.Text.ASCIIEncoding.ASCII.GetString(buffer);
return stuff.SplitInParts(stringLength).Select(ss => (T)(object)ss).ToArray();
}
T[] dataArray = new T[dims[0]];
var wrapArray = new H5Array<T>(dataArray);
H5D.read(dataset, dataType, wrapArray);
return dataArray;
}
public static T[,] Read2DArray<T>(this H5FileId fileId, string dataSetName)
{
var dataset = H5D.open(fileId, dataSetName);
var space = H5D.getSpace(dataset);
var dims = H5S.getSimpleExtentDims(space);
var dataType = H5D.getType(dataset);
if (typeof(T) == typeof(string))
{
// this will also need a string hack...
}
T[,] dataArray = new T[dims[0], dims[1]];
var wrapArray = new H5Array<T>(dataArray);
H5D.read(dataset, dataType, wrapArray);
return dataArray;
}
}

Related

How to write a text file with specific positions with FileStream?

I am trying to write a text file with this structure
For this I am working with Seek() and Write(), but every time I want to add a new data it is overwritten, and in the end only one line is written to the file
This is the code I am using
private async void CreacionArchivoBCD(ArchivoPerfil contenidoPerfil, string nombreFinalBCD, bool checkPrevia)
{
var result = await ensayoDataProvider.ObtenerUexCapturador(checkPrevia, contenidoPerfil, codensSeleccionado.Id);
var cant = System.Text.Encoding.UTF8.GetBytes(Convert.ToString(result.Count()));
////TextWriter archivoExportar = new StreamWriter(nombreFinalBCD);
////archivoExportar.Close();
var file = new FileStream(nombreFinalBCD, FileMode.OpenOrCreate, FileAccess.Write);
file.Seek(1, SeekOrigin.Begin);
file.Write(cant, 0, cant.Length);
byte[] newline = Encoding.ASCII.GetBytes(Environment.NewLine);
file.Write(newline, 0, newline.Length);
int nroMedAux = 0;
long rutaAux = 0;
var cont = 0;
var bandera = 0;
//var posicion = 0;
foreach (var item in result)
{
var primerByte = System.Text.Encoding.UTF8.GetBytes("00");
file.Seek(0, SeekOrigin.Begin);
file.Write(primerByte, 0, primerByte.Length);
if (bandera == 0)
{
nroMedAux = item.IdMed;
rutaAux = item.Ruta;
bandera = 1;
}
var tipo = item.GetType();
if (nroMedAux != item.IdMed || rutaAux != item.Ruta)
{
nroMedAux = item.IdMed;
rutaAux = item.Ruta;
file.Write(newline, 0, newline.Length);
}
foreach (var pi in tipo.GetProperties())
{
var propName = pi.Name;
var propValue = pi.GetValue(item, null)?.ToString();
var propValueAux = propName == "Campo" ? item.GetType().GetProperty("Valor").GetValue(item, null) : propValue;
if (propName == "IdMed" || propName == "Valor")
{
continue;
}
//TODO: buscar posicion y ancho por nombre campo
var posicion = BuscarPosicion(propName, propValue, contenidoPerfil);
//var ancho = BuscarAncho(propName, contenidoPerfil);
Debug.WriteLine(propName);
Debug.WriteLine(propValueAux);
Debug.WriteLine(posicion);
if (posicion == -1)
{
continue;
}
var lacadena = propValue;
var lacadenAux = propName == "Campo" ? propValueAux : propValue;
var cadena = System.Text.Encoding.UTF8.GetBytes(lacadenAux.ToString());
file.Seek(posicion, SeekOrigin.Begin);
file.Write(cadena, 0, cadena.Length);
}
}
file.Close();
}

This is not how you should do it. Since you are writing text to a file, you should use a StreamWriter.
Then to create the column formatting, simply use string.PadLeft (or string.PadRight).
To handle streams (or IDisposable implementations in general) use the using statement. Don't close resources explicitly.
Also use the async API to improve the performance.
The following example shows the pattern you should use:
Generate the data structure (rows)
Format the data
Write all data at once to the file
The algorithm takes different cell value lengths into account to generate even columns.
If you use .NET Standard 2.1 (.NET Core 3.0, .NET 5) you can even make use of IAsyncDisposable.
private async Task WriteDataTableToFileAsync(string filePath)
{
List<List<string>> rows = GenerateData();
string fileContent = FormatData("4080", rows, 4);
await WriteToFileAsync(fileContent, filePath);
}
private List<List<string>> GenerateData()
{
// Generate the complete data first
var rows = new List<List<string>>
{
// Row 1
new List<string>
{
// Cells
"00",
"1",
"LD126",
"NN",
"1",
"0",
"0 0",
"49",
"2"
},
// Row 2
new List<string>
{
// Cells
"00",
"1",
"Rell",
"NN",
"1",
"0",
"0 0",
"49",
"2"
}
};
return rows;
}
private string FormatData(string preamble, List<List<string>> rows, int columnGapWidth)
{
preamble = preamble.PadLeft(preamble.Length + 1) + Environment.NewLine;
var fileContentBuilder = new StringBuilder(preamble);
var columnWidths = new Dictionary<int, int>();
int columnCount = rows.First().Count;
foreach (List<string> cells in rows)
{
var rowBuilder = new StringBuilder();
for (int columnIndex = 0; columnIndex < columnCount; columnIndex++)
{
if (!columnWidths.TryGetValue(columnIndex, out int columnWidth))
{
int maxCellWidthOfColumn = rows
.Select(cells => cells[columnIndex])
.Max(cell => cell.Length);
columnWidth = maxCellWidthOfColumn + columnGapWidth;
columnWidths.Add(columnIndex, columnWidth);
}
string cell = cells[columnIndex];
rowBuilder.Append(cell.PadRight(columnWidth));
}
fileContentBuilder.Append(rowBuilder.ToString().TrimEnd());
fileContentBuilder.Append(Environment.NewLine);
}
string fileContent = fileContentBuilder.ToString().TrimEnd();
return fileContent;
}
private async Task WriteToFileAsync(string fileContent, string filePath)
{
await using var destinationFileStream = File.Open(filePath, FileMode.Create, FileAccess.Write);
await using var streamWriter = new StreamWriter(destinationFileStream);
await streamWriter.WriteAsync(fileContent);
}
You can further improve the implementation: add a Data, Row and a Cell class. The Row class has a Cell collection. The Data class has a Row collection and additional data like the preamble or formatting info like the column gap width. This way you eliminate method parameters to improve readability. Then replace the current List<List<string>> data structure with the Data class.

Before file.Close(), add file.Flush() or await file.FlushAsync() -> this flushes the buffer to the output.
See https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream.flush?view=net-6.0.

Reading from file with different line length

does anyone know how to read from file to array(container with inheritance) when the line length is different?(I hope language barrier won't make any problems:))
Sportas Skaitymas(Sportas sportas)
{
SportininkasCointainer sportininkai = new SportininkasCointainer();
KomandaContainer komandos = new KomandaContainer();
using (StreamReader reader = new StreamReader("Duomenys.txt"))
{
string line = null;
while ((line = reader.ReadLine()) != null)
{
string[] values;
values = line.Split(';');
string a = values[0];
string b = values[1];
string c = values[2];
string d = values[3];
string e = values[4];
string f = values[5];
string g = values[6];
switch (values.Length)
{
case 7:
Krepsininkas krepsininkas = new Krepsininkas(a, b, c, int.Parse(d), int.Parse(e), int.Parse(f), int.Parse(g));
sportininkai.AddSportinkas(krepsininkas as Krepsininkas);
break;
case 6:
Futbolininkas futbolininkas = new Futbolininkas(a, b, c, int.Parse(d), int.Parse(e), int.Parse(f));
sportininkai.AddSportinkas(futbolininkas as Futbolininkas);
break;
case 4:
Komanda komanda = new Komanda(a, b, c, int.Parse(d));
komandos.AddKomanda(komanda);
break;
}
}
return sportas;
}
I would be very grateful :)

You can use:
string a = values.Length > 0 ? values[0] : "";
string b = values.Length > 1 ? values[1] : "";
Here, string is only set when value is present else it will be ""

Google Reporting API V4 Missing Values

I've been having a problem with Google's analytic reporting api v4. When I make a request, i can get data back, but some dimension and metric values are missing and/or inconsistent.
For example if i wanted the fullRefferer, it would return (not set). Or when i do get values my page views value could be 1312 and my sessions could be 26.
My code for making the request is below:
public GetReportsResponse Get(string viewId, DateTime startDate, DateTime endDate, string nextPageToken = null)
{
try
{
var credential = GetCredential();
using (var svc = new AnalyticsReportingService(
new BaseClientService.Initializer
{
HttpClientInitializer = credential
}))
{
var mets = new List<Metric>
{
new Metric
{
Alias = "Users",
Expression = "ga:users"
},
new Metric
{
Alias = "Bounce Rate",
Expression = "ga:bounceRate"
},
new Metric
{
Alias = "Page Views",
Expression = "ga:pageViews"
},
new Metric()
{
Alias = "Sessions",
Expression = "ga:sessions"
}
};
var dims = new List<Dimension>
{
new Dimension { Name = "ga:date" },
new Dimension { Name = "ga:hour" },
new Dimension { Name = "ga:browser" },
new Dimension { Name = "ga:pagePath" },
new Dimension { Name = "ga:fullReferrer"}
};
var dateRange = new DateRange
{
StartDate = startDate.ToFormattedString(),
EndDate = endDate.ToFormattedString()
};
var reportRequest = new ReportRequest
{
DateRanges = new List<DateRange> { dateRange },
Dimensions = dims,
Metrics = mets,
ViewId = viewId,
PageToken = nextPageToken
};
var getReportsRequest = new GetReportsRequest
{
ReportRequests = new List<ReportRequest> { reportRequest },
};
var batchRequest = svc.Reports.BatchGet(getReportsRequest);
var response = batchRequest.Execute();
return response;
}
}
catch (Exception e)
{
return null;
}
}
And my code for filtering the results is here:
public static List<AnalyticEntry> Filter(Google.Apis.AnalyticsReporting.v4.Data.GetReportsResponse response)
{
if (response == null) return new List<AnalyticEntry>();
List<GoogleDataDto> gData = new List<GoogleDataDto>();
foreach (var report in response.Reports)
{
foreach (var row in report.Data.Rows)
{
GoogleDataDto dto = new GoogleDataDto();
foreach (var metric in row.Metrics)
{
foreach (var value in metric.Values)
{
int index = metric.Values.IndexOf(value);
var metricHeader = report.ColumnHeader.MetricHeader.MetricHeaderEntries[index];
switch (metricHeader.Name)
{
case "Sessions":
dto.Sessions = Convert.ToInt32(value);
break;
case "Bounce Rate":
dto.BounceRate = Convert.ToDecimal(value);
break;
case "Page Views":
dto.PageViews = Convert.ToInt32(value);
break;
case "Users":
dto.Users = Convert.ToInt32(value);
break;
}
}
}
foreach (var dimension in row.Dimensions)
{
int index = row.Dimensions.IndexOf(dimension);
var dimensionName = report.ColumnHeader.Dimensions[index];
switch (dimensionName)
{
case "ga:date":
dto.Date = dimension;
break;
case "ga:hour":
dto.Hour = dimension;
break;
case "ga:browser":
dto.Browser = dimension;
break;
case "ga:pagePath":
dto.PagePath = dimension;
break;
case "ga:source":
dto.Source = dimension;
break;
case "ga:fullRefferer":
dto.Referrer = dimension;
break;
}
}
gData.Add(dto);
}
}
return Combine(gData);
}
private static List<AnalyticEntry> Combine(IReadOnlyCollection<GoogleDataDto> gData)
{
List<AnalyticEntry> outputDtos = new List<AnalyticEntry>();
var dates = gData.GroupBy(d => d.Date.Substring(0,6)).Select(d => d.First()).Select(d => d.Date.Substring(0,6)).ToList();
foreach (var date in dates)
{
var entities = gData.Where(d => d.Date.Contains(date)).ToList();
AnalyticEntry dto = new AnalyticEntry
{
Date = date.ToDate(),
PageViews = 0,
Sessions = 0,
Users = 0,
BounceRate = 0
};
foreach (var entity in entities)
{
dto.BounceRate += entity.BounceRate;
dto.PageViews += entity.PageViews;
dto.Users += entity.Users;
dto.Sessions += entity.Sessions;
}
dto.BounceRate = dto.BounceRate / entities.Count();
var dictionaries = entities.GetDictionaries();
var commonBrowsers = dictionaries[0].GetMostCommon();
var commonTimes = dictionaries[1].GetMostCommon();
var commonPages = dictionaries[2].GetMostCommon();
var commonSources = dictionaries[3].GetMostCommon();
var commonReferrers = dictionaries[4].GetMostCommon();
dto.CommonBrowser = commonBrowsers.Key;
dto.CommonBrowserViews = commonBrowsers.Value;
dto.CommonTimeOfDay = commonTimes.Key.ToInt();
dto.CommonTimeOfDayViews = commonTimes.Value;
dto.CommonPage = commonPages.Key;
dto.CommonPageViews = commonPages.Value;
dto.CommonSource = commonSources.Key;
dto.CommonSourceViews = commonSources.Value;
dto.CommonReferrer = commonReferrers.Key;
dto.CommonReferrerViews = commonReferrers.Value;
outputDtos.Add(dto);
}
return outputDtos;
}
I'm not sure what else to put, please comment for more info :)

Solved!
Originally I was trying to find a 'metric name' based on the location of a value in an array. So using the location I would get the name and set the value.
The problem was the array could have multiple values which were the same.
For example:
var arr = [1,0,3,1,1];
If a value was 1, I was trying to use the location of 1 in the array to get a name.
So if the index of 1 in the array was 0, I would find its name by using that index and finding the name in another array.
For example:
var names = ['a','b','c'];
var values = [1,2,1];
var value = 1;
var index = values.indexOf(value); // which would be 0
SetProperty(
propertyName:names[index], // being a
value: value);
Although its hard to explain I was setting the same value multiple times due to the fact that there were more than one value equal to the same thing in the array.
Here is the answer. Tested and works
public List<GoogleDataDto> Filter(GetReportsResponse response)
{
if (response == null) return null;
List<GoogleDataDto> gData = new List<GoogleDataDto>();
foreach (var report in response.Reports)
{
foreach (var row in report.Data.Rows)
{
GoogleDataDto dto = new GoogleDataDto();
foreach (var metric in row.Metrics)
{
int index = 0; // Index counter, used to get the metric name
foreach (var value in metric.Values)
{
var metricHeader = report.ColumnHeader.MetricHeader.MetricHeaderEntries[index];
//Sets property value based on the metric name
dto.SetMetricValue(metricHeader.Name, value);
index++;
}
}
int dIndex = 0; // Used to get dimension name
foreach (var dimension in row.Dimensions)
{
var dimensionName = report.ColumnHeader.Dimensions[dIndex];
//Sets property value based on dimension name
dto.SetDimensionValue(dimensionName, dimension);
dIndex++;
}
// Will only add the dto to the list if its not a duplicate
if (!gData.IsDuplicate(dto))
gData.Add(dto);
}
}
return gData;
}

working Faster-rcnn in cntk c#?

i'm trying to get faster-rcnn model working in C# code.
i have a faster-rcnn trained model that builds and tests in the CNTK python code Below is my attempts to get it working from c#
so far i have:
string testImage = #"C:\data\images\mytestimage.jpg";
Bitmap bitmap = new Bitmap(Bitmap.FromFile(testImage ));
DeviceDescriptor device = DeviceDescriptor.CPUDevice;
Function modelFunc = Function.Load(modelPath, device);
var inDoims = modelFunc.Arguments;
Variable inputVar = modelFunc.Arguments.FirstOrDefault();
Variable inputVar2 = modelFunc.Arguments[1];
NDShape inputShape = inputVar.Shape;
int imageWidth = inputShape[0];
int imageHeight = inputShape[1];
int imageChannels = inputShape[2];
int imageSize = inputShape.TotalSize;
bitmap = ImageProcessing.Resize(bitmap, imageWidth, imageHeight, true);
var pixels = ImageProcessing.ParallelExtractCHW(bitmap);
Variable input = modelFunc.Arguments[0];
var inputDataMap = new Dictionary<Variable, Value>();
List<float> input2Vals = new List<float>();
input2Vals.Add(imageWidth);
input2Vals.Add(imageHeight);
input2Vals.Add(imageWidth);
input2Vals.Add(imageHeight);
input2Vals.Add(imageWidth);
input2Vals.Add(imageHeight);
Value inputVal = Value.CreateBatch(inputVar.Shape, pixels, device);
Value inputValue2 = Value.CreateBatch(inputVar2.Shape, input2Vals, device);
inputDataMap[input] = inputVal;
inputDataMap[inputVar2] = inputValue2;
NDShape outputShape1 = modelFunc.Outputs[0].Shape;
NDShape outputShape2 = modelFunc.Outputs[1].Shape;
NDShape outputShape3 = modelFunc.Outputs[2].Shape;
Value outputValue1 = null;
Value outputValue2 = null; ;
Value outputValue3 = null; ;
var outputDataMap = new Dictionary<Variable, Value>()
{
{ modelFunc.Outputs[0], outputValue1 },
{ modelFunc.Outputs[1], outputValue2 },
{ modelFunc.Outputs[2], outputValue3 }
};
//run the model
modelFunc.Evaluate(inputDataMap, outputDataMap, device);
var out0 = outputDataMap[modelFunc.Outputs[0]];
var out1 = outputDataMap[modelFunc.Outputs[1]];
var out2 = outputDataMap[modelFunc.Outputs[2]];
var clsPred = out0.GetDenseData<float>(modelFunc.Outputs[0])[0];
var rois = out1.GetDenseData<float>(modelFunc.Outputs[1])[0];
var vbboxR = out2.GetDenseData<float>(modelFunc.Outputs[2])[0];
var labels = new[] { "__background__", "firstobject", "secondobject", "thirdsobject"};
....
}
but the results I get back in clsPred don't make any sense...for example when my test image is completely blank...the model seems to think there are objects in it....
Has anyone any recommendations?
Buzz

Array.Sort for strings with numbers [duplicate]

This question already has answers here:
Natural Sort Order in C#
(18 answers)
Closed 8 years ago.
I have sample codes below:
List<string> test = new List<string>();
test.Add("Hello2");
test.Add("Hello1");
test.Add("Welcome2");
test.Add("World");
test.Add("Hello11");
test.Add("Hello10");
test.Add("Welcome0");
test.Add("World3");
test.Add("Hello100");
test.Add("Hello20");
test.Add("Hello3");
test.Sort();
But what happen is, the test.Sort will sort the array to:
"Hello1",
"Hello10",
"Hello100",
"Hello11",
"Hello2",
"Hello20",
"Hello3",
"Welcome0",
"Welcome2",
"World",
"World3"
Is there any way to sort them so that the string will have the correct number order as well?
(If there is no number at the end of the string, that string will always go first - after the alphabetical order)
Expected output:
"Hello1",
"Hello2",
"Hello3",
"Hello10",
"Hello11",
"Hello20",
"Hello100",
"Welcome0",
"Welcome2",
"World",
"World3"

Here is a one possible way using LINQ:
var orderedList = test
.OrderBy(x => new string(x.Where(char.IsLetter).ToArray()))
.ThenBy(x =>
{
int number;
if (int.TryParse(new string(x.Where(char.IsDigit).ToArray()), out number))
return number;
return -1;
}).ToList();

Create an IComparer<string> implementation. The advantage of doing it this way over the LINQ suggestions is you now have a class that can be passed to anything that needs to sort in this fashion rather that recreating that linq query in other locations.
This is specific to your calling a sort from a LIST. If you want to call it as Array.Sort() please see version two:
List Version:
public class AlphaNumericComparer : IComparer<string>
{
public int Compare(string lhs, string rhs)
{
if (lhs == null)
{
return 0;
}
if (rhs == null)
{
return 0;
}
var s1Length = lhs.Length;
var s2Length = rhs.Length;
var s1Marker = 0;
var s2Marker = 0;
// Walk through two the strings with two markers.
while (s1Marker < s1Length && s2Marker < s2Length)
{
var ch1 = lhs[s1Marker];
var ch2 = rhs[s2Marker];
var s1Buffer = new char[s1Length];
var loc1 = 0;
var s2Buffer = new char[s2Length];
var loc2 = 0;
// Walk through all following characters that are digits or
// characters in BOTH strings starting at the appropriate marker.
// Collect char arrays.
do
{
s1Buffer[loc1++] = ch1;
s1Marker++;
if (s1Marker < s1Length)
{
ch1 = lhs[s1Marker];
}
else
{
break;
}
} while (char.IsDigit(ch1) == char.IsDigit(s1Buffer[0]));
do
{
s2Buffer[loc2++] = ch2;
s2Marker++;
if (s2Marker < s2Length)
{
ch2 = rhs[s2Marker];
}
else
{
break;
}
} while (char.IsDigit(ch2) == char.IsDigit(s2Buffer[0]));
// If we have collected numbers, compare them numerically.
// Otherwise, if we have strings, compare them alphabetically.
string str1 = new string(s1Buffer);
string str2 = new string(s2Buffer);
int result;
if (char.IsDigit(s1Buffer[0]) && char.IsDigit(s2Buffer[0]))
{
var thisNumericChunk = int.Parse(str1);
var thatNumericChunk = int.Parse(str2);
result = thisNumericChunk.CompareTo(thatNumericChunk);
}
else
{
result = str1.CompareTo(str2);
}
if (result != 0)
{
return result;
}
}
return s1Length - s2Length;
}
}
call like so:
test.sort(new AlphaNumericComparer());
//RESULT
Hello1
Hello2
Hello3
Hello10
Hello11
Hello20
Hello100
Welcome0
Welcome2
World
World3
Array.sort version:
Create class:
public class AlphaNumericComparer : IComparer
{
public int Compare(object x, object y)
{
string s1 = x as string;
if (s1 == null)
{
return 0;
}
string s2 = y as string;
if (s2 == null)
{
return 0;
}
int len1 = s1.Length;
int len2 = s2.Length;
int marker1 = 0;
int marker2 = 0;
// Walk through two the strings with two markers.
while (marker1 < len1 && marker2 < len2)
{
var ch1 = s1[marker1];
var ch2 = s2[marker2];
// Some buffers we can build up characters in for each chunk.
var space1 = new char[len1];
var loc1 = 0;
var space2 = new char[len2];
var loc2 = 0;
// Walk through all following characters that are digits or
// characters in BOTH strings starting at the appropriate marker.
// Collect char arrays.
do
{
space1[loc1++] = ch1;
marker1++;
if (marker1 < len1)
{
ch1 = s1[marker1];
}
else
{
break;
}
} while (char.IsDigit(ch1) == char.IsDigit(space1[0]));
do
{
space2[loc2++] = ch2;
marker2++;
if (marker2 < len2)
{
ch2 = s2[marker2];
}
else
{
break;
}
} while (char.IsDigit(ch2) == char.IsDigit(space2[0]));
// If we have collected numbers, compare them numerically.
// Otherwise, if we have strings, compare them alphabetically.
var str1 = new string(space1);
var str2 = new string(space2);
var result = 0;
if (char.IsDigit(space1[0]) && char.IsDigit(space2[0]))
{
var thisNumericChunk = int.Parse(str1);
var thatNumericChunk = int.Parse(str2);
result = thisNumericChunk.CompareTo(thatNumericChunk);
}
else
{
result = str1.CompareTo(str2);
}
if (result != 0)
{
return result;
}
}
return len1 - len2;
}
}
Call like so:
This time test is an array instead of a list.
Array.sort(test, new AlphaNumericComparer())

You can use LINQ combined with regex to ensure that you use only numbers that occur at the end of the string for your secondary ordering
test
.Select(t => new{match = Regex.Match(t, #"\d+$"), val = t})
.Select(x => new{sortVal = x.match.Success
?int.Parse(x.match.Value)
:-1,
val = x.val})
.OrderBy(x => x.val)
.ThenBy(x => x.sortVal)
.Select(x => x.val)
.ToList()

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading string array from a HDF5 dataset - c#

Related

How to write a text file with specific positions with FileStream?

Reading from file with different line length

Google Reporting API V4 Missing Values

working Faster-rcnn in cntk c#?

Array.Sort for strings with numbers [duplicate]

Categories

Resources