Min-Max DataPoint Normilization

Min-Max DataPoint Normilization - c#

I have a list of DataPoint such as
List<DataPoint> newpoints=new List<DataPoint>();
where the DataPoint is a class consists of nine double features from A to I , and the
newpoints.count=100000 double points (i.e each point consists of nine double features from A to I)
I need to apply the normilization for List newpoints using Min-Max normilization method and the scale_range between 0 and 1.
I have implemneted so far the following steps
each DataPoints feature is assigned to one dimensional array. for example, the code for feature A
for (int i = 0; i < newpoints.Count; i++)
{ array_A[i] = newpoints[i].A;} and so on for all nine double features
I have applied the max-min normilization method. for example, the code for feature A:
normilized_featureA= (((array_A[i] - array_A.Min()) * (1 - 0)) /
(array_A.Max() - array_A.Min()))+0;
the method is succssfuly done but it takes more time (i.e. 3 minutes and 45 seconds)
how can i apply the Max_min normilization using the LINQ code in C# to reduce my time to a few seconds?
I found this question in Stackoverflow How to normalize a list of int values but my problem is
double valueMax = list.Max(); // I need Max point for feature A for all 100000
double valueMin = list.Min(); //I need Min point for feature A for all 100000
and so on for all others nine features
your help will be highly appreciated.

As an alternative to modelling your 9 features as double properties on a class "DataPoint", you could also model a datapoint of 9 doubles as an array, with the benefit being that you can do all 9 calculations in one pass, again, using LINQ:
var newpoints = new List<double[]>
{
new []{1.23, 2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12},
new []{2.34, 3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23},
new []{3.45, 4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34},
new []{4.56, 5.67, 6.78, 7.89, 8.90, 9.12, 12.23, 13.34, 15.32}
};
var featureStats = newpoints
// We make the assumption that all 9 data points are present on each row.
.First()
// 2 Anon Projections - first to determine min / max as a function of column
.Select((np, idx) => new
{
Idx = idx,
Max = newpoints.Max(x => x[idx]),
Min = newpoints.Min(x => x[idx])
})
// Second to add in the dynamic Range
.Select(x => new {
x.Idx,
x.Max,
x.Min,
Range = x.Max - x.Min
})
// Back to array for O(1) lookups.
.ToArray();
// Do the normalizaton for the columns, for each row.
var normalizedFeatures = newpoints
.Select(np => np.Select(
(i, idx) => (i - featureStats[idx].Min) / featureStats[idx].Range));
foreach(var datapoint in normalizedFeatures)
{
Console.WriteLine(string.Join(",", datapoint.Select(x => x.ToString("0.00"))));
}
Result:
0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
0.33,0.33,0.33,0.33,0.34,0.47,0.23,0.05,0.50
0.67,0.67,0.67,0.67,0.69,0.91,0.28,0.75,0.68
1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00

Stop recalculating the maximum/minimum over and over again, it doesn't change.
double maxInFeatureA = array_A.Max();
double minInFeatureA = array_A.Min();
// somewher in the loop:
normilized_featureA= (((array_A[i] - minInFeatureA ) * (1 - 0)) /
(maxInFeatureA - minInFeatureA ))+0;
Max/Min is really expensive for array when used in foreach/for with many elements.
I suggest you to take this code: Array data normalization
and use it as
var normalizedPoints = newPoints.Select(x => x.A)
.NormalizeData(1, 1)
.ToList();

double min = newpoints.Min(p => p.A);
double max = newpoints.Max(p => p.A);
double readonly normalizer = 1 / (max - min);
var normalizedFeatureA = newpoints.Select(p => (p.A - min) * normalizer);

Related

Group dateTime by hour range

I got a list like this:
class Article
{
...
Public DateTime PubTime{get;set}
...
}
List<Article> articles
Now I want to group this list with hour range :[0-5,6-11,12-17,18-23]
I know there is a cumbersome way to do this:
var firstRange = articles.Count(a => a.PubTime.Hour >= 0 && a.PubTime.Hour <= 5);
But I want to use a elegant way. How can I do that?Use Linq Or anything others?

Group by Hour / 6:
var grouped = articles.GroupBy(a => a.PubTime.Hour / 6);
IDictionary<int, int> CountsByHourGrouping = grouped.ToDictionary(g => g.Key, g => g.Count());
The key in the dictionary is the period (0 representing 0-5, 1 representing 6-11, 2 representing 12-17, and 3 representing 18-23). The value is the count of articles in that period.
Note that your dictionary will only contain values where those times existed in the source data, so it won't always contain 4 items.

You could write a CheckRange Function, which takes your values and returns a bool. To make your code more reusable and elegant.
Function Example:
bool CheckRange (this int number, int min, int max)
=> return (number >= min && number <= max);
You could now use this function to check if the PubTime.Hour is in the correct timelimit.
Implementation Example:
var firstRange = articles.Count(a => a.CheckRange(0, 5));

How to subtract x from every number in a listbox

So I have a list box of numbers and I want to subtract an integer from every single number of a list box. Here is an example:
1
2
3
4
5
I want to get the absolute value of the difference
Math.Abs(2 - 1)
Math.Abs(2 - 2)
Math.Abs(2 - 3)
Math.Abs(2 - 4)
Math.Abs(2 - 5)
And put them in a list box.
I've tried:
while (i < listBox1.Items.Count)
{
result -= Convert.ToInt32(listBox1.Items[i++]);
int result1 = Convert.ToInt32(result)
int sub = Math.abs(result1)
}

Would this work :
I am using Select(x => {return x;}) notation of linq to do an operation on the array element, and return a value. In this case, the operation is Math.abs of each element of the array and a given number. _absDiffs will be an IEnumerable<int>, which you could call .ToArray() on to turn it into int[].
int[] _nums = {1,2,3,4,5};
int _number = 2;
var _absDiffs = _nums.Select(num=> { return Math.abs(_number - num);});

Calculate Percentile using LINQ

All,
Having reviewed StackOverflow and the wider internet, I am still struggling to efficiently calculate Percentiles using LINQ.
Where a percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. The below example attempts to convert a list of values, to an array where each (unique) value is represented with is associated percentile.
The min() and max() of the list are necessarily the 0% and 100% of the returned array percentiles.
Using LINQPad, the below code generates the required output a VP[]:
This can be interpreted as:
- At 0% the minimum value is 1
- At 100% the maximum value is 3
- At 50% between the minimum and maximum the value is 2
void Main()
{
var list = new List<double> {1,2,3};
double denominator = list.Count - 1;
var answer = list.Select(x => new VP
{
Value = x,
Percentile = list.Count(y => x > y) / denominator
})
//.GroupBy(grp => grp.Value) --> commented out until attempted duplicate solution
.ToArray();
answer.Dump();
}
public struct VP
{
public double Value;
public double Percentile;
}
However, this returns an incorrect VP[] when the "list" contains duplicate entries (e.g. 1,2,**2,**3) :
My attempts to group by unique values in the list (by including ".GroupBy(grp => grp.Value)") have failed to yield the desired result (Value =2, & Percentile = 0.666) :
All suggestions are welcome. Including whether this is an efficient approach given the repeated iteration with "list.Count(y => x > y)".
As always, thanks
Shannon

I'm not sure I understand the requirements of this question. When I ran the accepted answer's code I got this result:
But if I change the input to this:
var dataSet = new List<double> { 1, 1, 1, 1, 2, 3, 3, 3, 2 };
...I then get this result:
With the line "The min() and max() of the list are necessarily the 0% and 100% of the returned array percentiles." it seems to me the OP is asking for the values to be from 0 to 1, but the updated result goes beyond 1.
It also seems wrong to me that the first value should be 0% as I'm not sure what that means in context to the data.
After reading the linked Wikipedia page it seems that the OP is actually trying to do the reverse calculation to computing the percentile value. In fact the article says that the percentile for 0 is undefined. That makes sense because a percentile of 0 would be the empty set of values - and what is the maximum value of an empty set?
The OP seems to be computing the percentile from the values. So, in that sense, and knowing that 0 is undefined, it seems that the most appropriate value to compute is the percentage of values that are equal to or below each distinct value in the set.
Now, if I use the Microsoft's Reactive Framework Team's Interactive Extensions (NuGet "Ix-Main") then I can run this code:
var dataSet = new List<double> { 1, 1, 1, 1, 2, 3, 3, 3, 2 };
var result =
dataSet
.GroupBy(x => x)
.Scan(
new VP()
{
Value = double.MinValue, Proportion = 0.0
},
(a, x) =>
new VP()
{
Value = x.Key,
Proportion = a.Proportion + (double)x.Count() / dataSet.Count
});
I get this result:
This tells me that approximately 44% of the values are 1; that approximately 67% of the values are 1 or 2; and 100% of the values are either 1, 2, or 3.
This seems to me to be the most logical computation for the requirements.

This is how I did it. I changed a few of the variable names to make the context clearer.
var dataSet = new List<double> { 1, 2, 3, 2 };
double denominator = dataSet.Count - 1;
var uniqueValues = dataSet.Distinct();
var vp = dataSet.Select(value => new VP
{
Value = value,
Proportion = dataSet.Count(datum => value > datum) / denominator
});
var answer = uniqueValues.Select(u => new VP{
Value = u,
Proportion = vp.Where(v => v.Value == u).Select(x => x.Proportion).Sum()
});

void Main()
{
var list = new List<double> {1,2,3};
double denominator = list.Count - 1;
var answer = list.OrderBy(x => x).Select(x => new VP
{
Value = x,
Proportion = list.IndexOf(x) / denominator
})
.ToArray();
answer.Dump();
}
public struct VP
{
public double Value;
public double Proportion;
}

How to find data inside array that is most closest to a condition

I have an array of double Double[] array = new Double[5];
For example, if the array contains data like this:
{0.5 , 1.5 , 1.1 , 0.6 , 2}
How do I find the number that is closest to 1? The output should be 1.1, because it's the one that is closest to 1 in this case.

var result = source.OrderBy(x => Math.Abs(1 - x)).First();
Requires using System.Linq; at the top of a file. It's O(n log(n)) solution.
Update
If you're really afraid about performance and want O(n) solution, you can use MinBy() extension method from moreLINQ library.
Or you could use Aggregate() method:
var result = source.Aggregate(
new { val = 0d, abs = double.MaxValue },
(a, i) => Math.Abs(1 - i) > a.abs ? a : new { val = i, abs = Math.Abs(1 - i) },
a => a.val);

You can achieve this in a simple way using LINQ:
var closestTo1 = array.OrderBy(x => Math.Abs(x - 1)).First();

Something like this should be easy to understand by any programmer and has O(n) complexity (non-LINQ):
double minValue = array[0];
double minDifference = Math.Abs(array[0] - 1);
foreach (double val in array)
{
int dif = Math.Abs(x - 1);
if (dif < minValue)
{
minDifference = dif;
minValue = val;
}
}
After this code executes, minValue will have your required value.
Code summary:
It will set the minimum value as the first element of the array. Then the difference will be the absolute value of the first element minus 1.
This loop will linear search the array and find the minimum value of the array. If the difference is less than the minimum value it will set a new minimum difference and minimum value.

LINQ Grouping by Sum Value

Say I have a class like so:
public class Work
{
public string Name;
public double Time;
public Work(string name, double time)
{
Name = name;
Time = time;
}
}
And I have a List<Work> with about 20 values that are all filled in:
List<Work> workToDo = new List<Work>();
// Populate workToDo
Is there any possible way that I can group workToDo into segments where each segments sum of Time is a particular value? Say workToDo has values like so:
Name | Time
A | 3.50
B | 2.75
C | 4.25
D | 2.50
E | 5.25
F | 3.75
If I want the sum of times to be 7, each segment or List<Work> should have a bunch of values where the sum of all the Times is 7 or close to it. Is this even remotely possible or is it just a stupid question/idea? I am using this code to separate workToDo into segments of 4:
var query = workToDo.Select(x => x.Time)
.Select((x, i) => new { Index = i, Value = x})
.GroupBy(y => y.Index / 4)
.ToList();
But I am not sure how to do it based on the Times.

Here's a query that segments your data in groups where the times are near to 7, but not over:
Func<List<Work>,int,int,double> sumOfRange = (list, start, end) => list
.Skip(start)
.TakeWhile ((x, index) => index <= end)
.ToList()
.Sum (l => l.Time);
double segmentSize = 7;
var result = Enumerable.Range(0, workToDo.Count ())
.Select (index => workToDo
.Skip(index)
.TakeWhile ((x,i) => sumOfRange(workToDo, index, i)
<= segmentSize));
The output for your example data set is:
A 3.5
B 2.75
total: 6.25
B 2.75
C 4.25
total: 7
C 4.25
D 2.5
total: 6.75
D 2.5
total: 2.5
E 5.25
total: 5.25
F 3.75
total: 3.75
If you want to allow a segments to total over seven, then you could increase the segmentSize variable by 25% or so (i.e. make it 8.75).

This solution recurses through all combinations and returns the ones whose sums are close enough to the target sum.
Here is the pretty front-end method that lets you specify the list of work, the target sum, and how close the sums must be:
public List<List<Work>> GetCombinations(List<Work> workList,
double targetSum,
double threshhold)
{
return GetCombinations(0,
new List<Work>(),
workList,
targetSum - threshhold,
targetSum + threshhold);
}
Here is the recursive method that does all of the work:
private List<List<Work>> GetCombinations(double currentSum,
List<Work> currentWorks,
List<Work> remainingWorks,
double minSum,
double maxSum)
{
// Filter out the works that would go over the maxSum.
var newRemainingWorks = remainingWorks.Where(x => currentSum + x.Time <= maxSum)
.ToList();
// Create the possible combinations by adding each newRemainingWork to the
// list of current works.
var sums = newRemainingWorks
.Select(x => new
{
Works = currentWorks.Concat(new [] { x }).ToList(),
Sum = currentSum + x.Time
})
.ToList();
// The initial combinations are the possible combinations that are
// within the sum range.
var combinations = sums.Where(x => x.Sum >= minSum).Select(x => x.Works);
// The additional combinations get determined in the recursive call.
var newCombinations = from index in Enumerable.Range(0, sums.Count)
from combo in GetCombinations
(
sums[index].Sum,
sums[index].Works,
newRemainingWorks.Skip(index + 1).ToList(),
minSum,
maxSum
)
select combo;
return combinations.Concat(newCombinations).ToList();
}
This line will get combinations that sum to 7 +/- 1:
GetCombinations(workToDo, 7, 1);

What you are describing is a packing problem (where the tasks are being packed into 7-hour containers). Whilst it would be possible to use LINQ syntax in a solution to this problem, there is no solution inherent in LINQ that I am aware of.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Min-Max DataPoint Normilization - c#

double min = newpoints.Min(p => p.A); double max = newpoints.Max(p => p.A); double readonly normalizer = 1 / (max - min); var normalizedFeatureA = newpoints.Select(p => (p.A - min) * normalizer);

Related

Group dateTime by hour range

How to subtract x from every number in a listbox

Calculate Percentile using LINQ

How to find data inside array that is most closest to a condition

LINQ Grouping by Sum Value

Categories

Resources