I have a string of fixed length that has to be split at variable positions along the string to yield the substrings.
30849162 AUF3063100-2022031Doe Deanne 2610194031482100720081007200820000000000G43Z4206372 10 8 98282000000000911140000 00000000K6358Z8643K638 D126 Z099 320930090308009251519 132093 100720080071 0000000000000000000000000000000000000000000000000000000000000000000000002022031 000000000000000000000000000000000000000000000 00000000
The column break points are:
15, 18, 33, 61, 81, 89, 93, 94, 102, 110, 111, 114, 118,
Does anyone have an idea how I might do this? I have literally thousands of lines to parse
Put the break points in an array and use .substring() in a loop through those numbers. This is roughly how you want to do it, though you will have to adjust it to compensate for exactly where you want your column breaks.
int[] nums = {0, 15, 18, 33, 61, 81, 89, 93, 94, 102, 110, 111, 114, 118 };
string input = "Long string here";
for (int i = 0; i < nums.Length - 1; i++)
{
Console.WriteLine(input.Substring(nums[i], nums[i + 1] - nums[i]));
}
Or you could use some nasty LINQ like so..
public string[] ReturnMyStrings(string str)
{
int[] br = { 15, 18, 33, 61, 81, 89, 93, 94, 102, 110, 111, 114, 118 };
return br.Select((x, i) =>
str.Substring(br.ElementAtOrDefault(i - 1), x - br.ElementAtOrDefault(i - 1)))
.ToArray();
}
If you wanted to make your code scaleable you could implement some classes to do this work.
static void Main(string[] args)
{
string inputString = "30849162 AUF3063100-2022031Doe Deanne " +
"2610194031482100720081007200820000000000G43Z4" +
"206372 10 8 98282000000000911140000 00000000K" +
"6358Z8643K638 D126 Z099 320930090308009251519" +
"132093 100720080071 0000000000000000000000000" +
"000000000000000000000000000000000000000000000" +
"002022031 00000000000000000000000000000000000" +
"0000000000 00000000";
//myRecord will hold the entire input in its split form
var myRecord = new StringSplitterRecord()
{
fields = new List<StringSplitterField>()
{
//define all the different fields
new StringSplitterField(inputString, 0, 15, "Name of field 1"),
new StringSplitterField(inputString, 15, 3, "Name of field 2"),
new StringSplitterField(inputString, 18, 15, "Name of field 3"),
new StringSplitterField(inputString, 33, 28, "Name of field 4"),
new StringSplitterField(inputString, 61, 20, "Name of field 5"),
new StringSplitterField(inputString, 81, 8, "Name of field 6"),
new StringSplitterField(inputString, 93, 1, "Name of field 7"),
new StringSplitterField(inputString, 94, 8, "Name of field 8"),
new StringSplitterField(inputString, 102, 8, "Name of field 9"),
new StringSplitterField(inputString, 110, 1, "Name of field 10"),
new StringSplitterField(inputString, 111, 3, "Name of field 11"),
new StringSplitterField(inputString, 114, 4, "Name of field 12"),
}
};
}
class StringSplitterRecord
{
public List<StringSplitterField> fields;
}
class StringSplitterField
{
private string _contents;
private string _fieldType;
public StringSplitterField(string originalString, int startLocation, int length, string fieldType)
{
_contents = originalString.Substring(startLocation, length);
_fieldType = fieldType;
}
}
This will not only split your input string into the require pieces but it will put them all in a list with a name for each sub section. Then you can use LINQ etc to retrieve the data that you need.
Related
Initial DataFrame in Pandas
Let's suppose we have the following in Python with pandas:
import pandas as pd
df = pd.DataFrame({
"Col1": [10, 20, 15, 30, 45],
"Col2": [13, 23, 18, 33, 48],
"Col3": [17, 27, 22, 37, 52] },
index=pd.date_range("2020-01-01", "2020-01-05"))
df
Here's what we get in Jupyter:
Shifting columns
Now let's shift Col1 by 2 and store it in Col4.
We'll also store df['Col1'] / df['Col1'].shift(2) in Col5:
df_2 = df.copy(deep=True)
df_2['Col4'] = df['Col1'].shift(2)
df_2['Col5'] = df['Col1'] / df['Col1'].shift(2)
df_2
The result:
C# version
Now let's setup a similar DataFrame in C#:
#r "nuget:Microsoft.Data.Analysis"
using Microsoft.Data.Analysis;
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i => new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df
The result in .NET Interactive:
Question
What's a good way to perform the equivalent column shifts as demonstrated in the Pandas version?
Notes
The above example is from the documentation for pandas.DataFrame.shift:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html
Update
It does indeed look like there isn't currently a built-in shift in Microsoft.Data.Analysis. I've posted an issue for this here:
https://github.com/dotnet/machinelearning/issues/6008
Helper functions
Perform a column shift.
PrimitiveDataFrameColumn<double> ShiftIntColumn(PrimitiveDataFrameColumn<int> col, int n, string name)
{
return
new PrimitiveDataFrameColumn<double>(
name,
Enumerable.Repeat((double?) null, n)
.Concat(col.Select(item => (double?) item))
.Take(col.Count()));
}
Carry out division, taking care of null values in divisor.
PrimitiveDataFrameColumn<double> DivAlt3(PrimitiveDataFrameColumn<int> a, PrimitiveDataFrameColumn<double> b, string name)
{
return
new PrimitiveDataFrameColumn<double>(name, a.Zip(b, (x, y) => y == null ? null : x / y));
}
Then the following:
var df = new DataFrame(
new PrimitiveDataFrameColumn<DateTime>("DateTime",
Enumerable.Range(0, 5).Select(i =>
new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),
new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);
df.Columns.Add(ShiftIntColumn((PrimitiveDataFrameColumn<int>)df["Col1"], 2, "Col4"));
df.Columns.Add(DivAlt3((PrimitiveDataFrameColumn<int>) df["Col1"], (PrimitiveDataFrameColumn<double>) df["Col4"], "Col5"));
results in:
Complete notebook
See the following notebook for a full demonstration of the above:
https://github.com/dharmatech/dataframe-shift-example-cs/blob/003/dataframe-shift-example-cs.ipynb
Notes
It would be great if Microsoft.Data.Analysis came with column shift functionality.
It would also be great if column division handled nulls natively.
Would love to see other perhaps more idiomatic approaches to this.
I have a list of Objects with a list of types inside of each Object, somthing very similar like this:
public class ExampleObject
{
public int Id {get; set;}
public IEnumerable <int> Types {get;set;}
}
For example:
var typesAdmited = new List<int> { 13, 11, 67, 226, 82, 1, 66 };
And inside the list of Object I have an object like this:
Object.Id = 288;
Object.Types = new List<int> { 94, 13, 11, 67, 254, 256, 226, 82, 1, 66, 497, 21};
But when I use linq to get all Object who has the types admited I get any results.
I am trying this:
var objectsAdmited = objects.Where(b => b.Types.All(t => typesAdmited.Contains(t)));
Example:
var typesAdmited = new List<int> { 13, 11, 67, 226, 82, 1, 66 };
var objectNotAdmited = new ExampleObeject {Id = 1, Types = new List<int> {13,11}};
var objectAdmited = new ExampleObject {Id = 288, Types = new List<int> { 94, 13, 11, 67, 254, 256, 226, 82, 1, 66, 497, 21}};
var allObjects = new List<ExampleObject> { objectNotAdmited, objectAdmited };
var objectsAdmited = allObjects.Where(b => b.Types.All(t => typesAdmited.Contains(t)));
I get:
objectsAdmited = { }
And it should be:
objectsAdmited = { objectAdmited }
You have to change both lists in your LINQ query interchangeably:
var objectsAdmited = allObjects.Where(b => typesAdmited.All(t => b.Types.Contains(t)));
You can solve this using Linq. See the small code block in the middle - the rest is boilerplate to make it a Minimal complete verifyabe answer:
using System;
using System.Collections.Generic;
using System.Linq;
public class ExampleObject
{
public int Id { get; set; }
public IEnumerable<int> Types { get; set; }
}
class Program
{
static void Main (string [] args)
{
var obs = new List<ExampleObject>
{
new ExampleObject
{
Id=1,
Types=new List<int> { 94, 13, 11, 67, 254, 256, 226, 82, 1, 66, 497, 21 }
},
new ExampleObject
{
Id=288,
Types=new List<int> { 94, 13, 11, 67, 256, 226, 82, 1, 66, 497, 21 }
},
};
var must_support = new List<int>{11, 67, 254, 256, 226, 82, }; // only Id 1 fits
var must_support2 = new List<int>{11, 67, 256, 226, 82, }; // both fit
// this is the actual check: see for all objects in obs
// if all values of must_support are in the Types - Listing
var supports = obs.Where(o => must_support.All(i => o.Types.Contains(i)));
var supports2 = obs.Where(o => must_support2.All(i => o.Types.Contains(i)));
Console.WriteLine ("new List<int>{11, 67, 254, 256, 226, 82, };");
foreach (var o in supports)
Console.WriteLine (o.Id);
Console.WriteLine ("new List<int>{11, 67, 256, 226, 82, };");
foreach (var o in supports2)
Console.WriteLine (o.Id);
Console.ReadLine ();
}
}
Output:
new List<int>{11, 67, 254, 256, 226, 82, };
1
new List<int>{11, 67, 256, 226, 82, };
1
288
Suppose I have the following array (my sequences are all sorted in ascending order, and contain positive integers)
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
I made a code using LINQ and a loop to search missing numbers like that :
List<Int32> lstSearch = new List<int>();
var lstGroup = tabSequence
.Select((val, ind) => new { val, group = val - ind })
.GroupBy(v => v.group, v => v.val)
.Select(group => new{ GroupNumber = group.Key, Min = group.Min(), Max = group.Max() }).ToList();
for (int number = 0; number < lstGroup.Count; number++)
{
if (number < lstGroup.Count-1)
{
for (int missingNumber = lstGroup[number].Max+1; missingNumber < lstGroup[number+1].Min; missingNumber++)
lstSearch.Add(missingNumber);
}
}
var tabSequence2 = lstSearch.ToArray();
// Same result as var tabSequence2 = new[] {4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31 };
This code works but i'd like to know if there a better way to do the same thing only with linq.
Maybe I am just not understanding the problem. Your code seems very complicated, you could make this a lot simpler:
int[] tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var results = Enumerable.Range(1, tabSequence.Max()).Except(tabSequence);
//results is: 4, 5, 6, 10, 11, 13, 14, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 30, 31
I made a fiddle here
You can use IEnumerable.Aggregate to your advantage. The overload I choose uses an accumulator seed (empty List<IEnumerable<int>>) and proceeds to iterate over each item in your array.
The first time I set an lastNR defined before using the aggregate to the firsst number we iterate over. We compare the nexts iterations actual nr against this lastNr.
If we are in sequence we just increment the lastNr.
If not, we generate the missing numbers via Enumerable.Range(a,count) between lastNr
and the actual nr and add them to our accumulator-List. Then we set the lastNr to nr to continue.
public static List<IEnumerable<int>> GetMissingSeq(int[] seq)
{
var lastNr = int.MinValue;
var missing = seq.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
// not in sequence, add the missing into our ac'umulator list
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
lastNr = nr; //thats the new lastNR to compare against in the next iteration
return acc;
}
);
return missing;
}
Tested by:
public static void Main(string[] args)
{
var tabSequence = new[] { 1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 };
var lastNr = int.MinValue;
var missing = tabSequence.Aggregate(
new List<IEnumerable<int>>(),
(acc, nr) =>
{
if (lastNr == int.MinValue || lastNr == nr - 1)
{
lastNr = nr; // first ever or in sequence
return acc; // noting to do
}
acc.Add(Enumerable.Range(lastNr + 1, nr - lastNr - 1));
return acc;
}
);
Console.WriteLine(string.Join(", ", tabSequence));
foreach (var inner in GetMissingSeq(tabSequence))
Console.WriteLine(string.Join(", ", inner));
Console.ReadLine();
}
Output:
1, 2, 3, 7, 8, 9, 12, 15, 16, 17, 22, 23, 32 // original followed by missing sequences
4, 5, 6
10, 11
13, 14
18, 19, 20, 21
24, 25, 26, 27, 28, 29, 30, 31
If you are not interested in the subsequences you can use GetMissingSeq(tabSequence).SelectMany(i => i) to flatten them into one IEnumerable.
I have 5 arrays which represents 1 city each. Each position in the array represents the distance to another city (all arrays shares the same position for each specific city). And I have two dropdown lists from where the user is supposed to select two cities to calculate the distance between them.
It's set up like this:
// City0, City1, City2, City3, City4
int[] distanceFromCity0 = { 0, 16, 39, 9, 24 };
int[] distanceFromCity1 = { 16, 0, 36, 32, 54 };
int[] distanceFromCity2 = { 39, 36, 0, 37, 55 };
int[] distanceFromCity3 = { 9, 32, 37, 0, 21 };
int[] distanceFromCity4 = { 24, 54, 55, 21, 0 };
int cityOne = Convert.ToInt16(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt16(DropDownList2.SelectedValue);
And within the dropdown lists each city has the corresponding ID (city0 = 0, city1 = 1 etc)
I have tried a few different ways, but none of them really works.
So basically, how do I "connect" DropDownList1 to one of the arrays depending on the choice, and then connecting DropDownList2 to one of the positions in the selected array (from DropDownList1 selection) and print it out to Label1?
Is it easier with a 2D array?
This probably looks easy for you, but I'm a noob in C#.
One way would be to combine distanceFromCity0 ... distanceFromCity4 into a single 2D array and use the two cities as indexes to the distance value:
int[][] distanceBetweenCities = {
new[]{ 0, 16, 39, 9, 24 },
new[]{ 16, 0, 36, 32, 54 },
new[]{ 39, 36, 0, 37, 55 },
new[]{ 9, 32, 37, 0, 21 },
new[]{ 24, 54, 55, 21, 0 }
};
int cityOne = Convert.ToInt32(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt32(DropDownList2.SelectedValue);
var distance = distanceBetweenCities[cityOne][cityTwo];
Yes, using two-dimensional array is very easy. You can regard it like a matrix. Some code like below:
int[,] distanceMatrix = new int[5, 5] { { 0, 16, 39, 9, 24 },
{ 16, 0, 36, 32, 54 },
{ 39, 36, 0, 37, 55 },
{ 9, 32, 37, 0, 21 },
{ 24, 54, 55, 21, 0 }
};
int cityOne = Convert.ToInt32(DropDownList1.SelectedValue);
int cityTwo = Convert.ToInt32(DropDownList2.SelectedValue);
var distance = distanceMatrix[cityOne, cityTwo]; //the distance between cityOne and cityTwo;
For example
List contains integer values 34, 78, 20, 10, 17, 99, 101, 24, 50, 13
and the value to put is 11 at position 1, 4 and 5
Position is the index value which starts from 0
so the final result is => 34, 11, 78, 20, 10, 11, 17, 11, 99, 101, 24, 50, 13
My current code is as follows:
List<int> list_iNumbers = new List<int>();
list_iNumbers.Add(34);
list_iNumbers.Add(78);
list_iNumbers.Add(20);
list_iNumbers.Add(10);
list_iNumbers.Add(17);
list_iNumbers.Add(99);
list_iNumbers.Add(101);
list_iNumbers.Add(24);
list_iNumbers.Add(50);
list_iNumbers.Add(13);
List<int> list_iPosition = new List<int>();
list_iPosition.Add(1);
list_iPosition.Add(4);
list_iPosition.Add(5);
int iValueToInsert = 11;
Now How to insert at these positions and get the correct result?
Use Insert(index, element) method instead of Add. Something like that:
foreach(var pos in list_iPosition.OrderByDescending(x => x))
list_iNumbers.Insert(pos, iValueToInsert);
You have to do it from the last index, to make it right. That's why I used OrderByDescending first.
Non Linq Solution:
For(int i = 0; i<count_of_numbers_to_insert; i++)
{
list_iNumbers.Insert(pos+i, valueToInsert);
}