C# - For vs Foreach - Huge performance difference - c#

i was making some optimizations to an algorithm that finds the smallest number that is bigger than X, in a given array, but then a i stumbled on a strange difference. On the code bellow, the "ForeachUpper" ends in 625ms, and the "ForUpper" ends in, i believe, a few hours (insanely slower). Why so?
class Teste
{
public double Valor { get; set; }
public Teste(double d)
{
Valor = d;
}
public override string ToString()
{
return "Teste: " + Valor;
}
}
private static IEnumerable<Teste> GetTeste(double total)
{
for (int i = 1; i <= total; i++)
{
yield return new Teste(i);
}
}
static void Main(string[] args)
{
int total = 1000 * 1000*30 ;
double test = total/2+.7;
var ieTeste = GetTeste(total).ToList();
Console.WriteLine("------------");
ForeachUpper(ieTeste.Select(d=>d.Valor), test);
Console.WriteLine("------------");
ForUpper(ieTeste.Select(d => d.Valor), test);
Console.Read();
}
private static void ForUpper(IEnumerable<double> bigList, double find)
{
var start1 = DateTime.Now;
double uppper = 0;
for (int i = 0; i < bigList.Count(); i++)
{
var toMatch = bigList.ElementAt(i);
if (toMatch >= find)
{
uppper = toMatch;
break;
}
}
var end1 = (DateTime.Now - start1).TotalMilliseconds;
Console.WriteLine(end1 + " = " + uppper);
}
private static void ForeachUpper(IEnumerable<double> bigList, double find)
{
var start1 = DateTime.Now;
double upper = 0;
foreach (var toMatch in bigList)
{
if (toMatch >= find)
{
upper = toMatch;
break;
}
}
var end1 = (DateTime.Now - start1).TotalMilliseconds;
Console.WriteLine(end1 + " = " + upper);
}
Thanks

IEnumerable<T> is not indexable.
The Count() and ElementAt() extension methods that you call in every iteration of your for loop are O(n); they need to loop through the collection to find the count or the nth element.
Moral: Know thy collection types.

The reason for this difference is that your for loop will execute bigList.Count() at every iteration. This is really costly in your case, because it will execute the Select and iterate the complete result set.
Furthermore, you are using ElementAt which again executes the select and iterates it up to the index you provided.

Related

How to auto-increment number and letter to generate a string sequence wise in c#

I have to make a string which consists a string like - AAA0009, and once it reaches AAA0009, it will generate AA0010 to AAA0019 and so on.... till AAA9999 and when it will reach to AAA9999, it will give AAB0000 to AAB9999 and so on till ZZZ9999.
I want to use static class and static variables so that it can auto increment by itself on every hit.
I have tried some but not even close, so help me out thanks.
Thanks for being instructive I was trying as I Said already but anyways you already want to put negatives over there without even knowing the thing:
Code:
public class GenerateTicketNumber
{
private static int num1 = 0;
public static string ToBase36()
{
const string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var sb = new StringBuilder(9);
do
{
sb.Insert(0, base36[(byte)(num1 % 36)]);
num1 /= 36;
} while (num1 != 0);
var paddedString = "#T" + sb.ToString().PadLeft(8, '0');
num1 = num1 + 1;
return paddedString;
}
}
above is the code. this will generate a sequence but not the way I want anyways will use it and thanks for help.
Though there's already an accepted answer, I would like to share this one.
P.S. I do not claim that this is the best approach, but in my previous work we made something similar using Azure Table Storage which is a no sql database (FYI) and it works.
1.) Create a table to store your running ticket number.
public class TicketNumber
{
public string Type { get; set; } // Maybe you want to have different types of ticket?
public string AlphaPrefix { get; set; }
public string NumericPrefix { get; set; }
public TicketNumber()
{
this.AlphaPrefix = "AAA";
this.NumericPrefix = "0001";
}
public void Increment()
{
int num = int.Parse(this.NumericPrefix);
if (num + 1 >= 9999)
{
num = 1;
int i = 2; // We are assuming that there are only 3 characters
bool isMax = this.AlphaPrefix == "ZZZ";
if (isMax)
{
this.AlphaPrefix = "AAA"; // reset
}
else
{
while (this.AlphaPrefix[i] == 'Z')
{
i--;
}
char iChar = this.AlphaPrefix[i];
StringBuilder sb = new StringBuilder(this.AlphaPrefix);
sb[i] = (char)(iChar + 1);
this.AlphaPrefix = sb.ToString();
}
}
else
{
num++;
}
this.NumericPrefix = num.ToString().PadLeft(4, '0');
}
public override string ToString()
{
return this.AlphaPrefix + this.NumericPrefix;
}
}
2.) Make sure you perform row-level locking and issue an error when it fails.
Here's an oracle syntax:
SELECT * FROM TICKETNUMBER WHERE TYPE = 'TYPE' FOR UPDATE NOWAIT;
This query locks the row and returns an error if the row is currently locked by another session.
We need this to make sure that even if you have millions of users generating a ticket number, it will not mess up the sequence.
Just make sure to save the new ticket number before you perform a COMMIT.
I forgot the MSSQL version of this but I recall using WITH (ROWLOCK) or something. Just google it.
3.) Working example:
static void Main()
{
TicketNumber ticketNumber = new TicketNumber();
ticketNumber.AlphaPrefix = "ZZZ";
ticketNumber.NumericPrefix = "9999";
for (int i = 0; i < 10; i++)
{
Console.WriteLine(ticketNumber);
ticketNumber.Increment();
}
Console.Read();
}
Output:
Looking at your code that you've provided, it seems that you're backing this with a number and just want to convert that to a more user-friendly text representation.
You could try something like this:
private static string ValueToId(int value)
{
var parts = new List<string>();
int numberPart = value % 10000;
parts.Add(numberPart.ToString("0000"));
value /= 10000;
for (int i = 0; i < 3 || value > 0; ++i)
{
parts.Add(((char)(65 + (value % 26))).ToString());
value /= 26;
}
return string.Join(string.Empty, parts.AsEnumerable().Reverse().ToArray());
}
It will take the first 4 characters and use them as is, and then for the remainder of the value if will convert it into characters A-Z.
So 9999 becomes AAA9999, 10000 becomes AAB0000, and 270000 becomes ABB0000.
If the number is big enough that it exceeds 3 characters, it will add more letters at the start.
Here's an example of how you could go about implementing it
void Main()
{
string template = #"AAAA00";
var templateChars = template.ToCharArray();
for (int i = 0; i < 100000; i++)
{
templateChars = IncrementCharArray(templateChars);
Console.WriteLine(string.Join("",templateChars ));
}
}
public static char Increment(char val)
{
if(val == '9') return 'A';
if(val == 'Z') return '0';
return ++val;
}
public static char[] IncrementCharArray(char[] val)
{
if (val.All(chr => chr == 'Z'))
{
var newArray = new char[val.Length + 1];
for (int i = 0; i < newArray.Length; i++)
{
newArray[i] = '0';
}
return newArray;
}
int length = val.Length;
while (length > -1)
{
char lastVal = val[--length];
val[length] = Increment(lastVal);
if ( val[length] != '0') break;
}
return val;
}

Dictionary or list in c#

I got a strange C# programming problem. There is a data retrieval in groups of random lengths of number groups. The numbers should be all unique, like:
group[1]{1,2,15};
group[2]{3,4,7,33,22,100};
group[3]{11,12,9};
// Now there is a routine that adds a number to a group.
// For the example, just imagine the active group looks like:
// group[active]=(10,5,0)
group[active].add(new_number);
// Now if 100 were to be added to the active group
// then the active group should be merged to group[2]
// (as that one already contained 100)
// And then as a result it would like
group[1]{1,2,15};
group[2]{3,4,7,33,22,100,10,5,0}; // 10 5 0 added to group[2]
group[3]{11,12,9};
// 100 wasn't added to group[2] since it was already in there.
If the number to be added is already used (not unique) in a previous group.
Then I should merge all numbers in the active group towards that previous group, so I don’t get double numbers.
So in the above example if number 100 was added to the active
group, then all numbers in the group[active] should be merged into group[2].
And then the group[active] should start clean fresh again without any items. And since 100 was already in group[2] it should not be added double.
I am not entirely sure on how to deal with this in a proper way.
As an important criteria here is that it has to work fast.
I will have around minimal 30 groups (upper-bound unknown might be 2000 or more), and their length on average contains five integer numbers, but it could be much longer or only one number.
I kind of feel that I am reinventing the wheel here.
I wonder what this problem is called (does it go by a name, some sorting, or grouping math problem)?, with a name I might find some articles related to such problems.
But maybe it’s indeed something new, then what would be recommended? Should I use list of lists or a dictionary of lists.. or something else? Somehow the checking if the number is already present should be done fast.
I'm thinking along this path now and am not sure if it’s the best.
Instead of a single number, I use a struct now. It wasn't written in the original question as I was afraid, explaining that would make it too complex.
struct data{int ID; int additionalNumber}
Dictionary <int,List<data>> group =new Dictionary<int, List<data>>();
I can step aside from using a struct in here. A lookup list could connect the other data to the proper index. So this makes it again more close to the original description.
On a side note, great answers are given.
So far I don’t know yet what would work best for me in my situation.
Note on the selected answer
Several answers were given here, but I went for the pure dictionary solution.
Just as a note for people in similar problem scenarios: I'd still recommend testing, and maybe the others work better for you. It’s just that in my case currently it worked best. The code was also quite short which I liked, and a dictionary adds also other handy options for my future coding on this.
I would go with Dictionary<int, HashSet<int>>, since you want to avoid duplicates and want a fast way to check if given number already exists:
Usage example:
var groups = new Dictionary<int, HashSet<int>>();
// populate the groups
groups[1] = new HashSet<int>(new[] { 1,2,15 });
groups[2] = new HashSet<int>(new[] { 3,4,7,33,22,100 });
int number = 5;
int groupId = 4;
bool numberExists = groups.Values.Any(x => x.Contains(number));
// if there is already a group that contains the number
// merge it with the current group and add the new number
if (numberExists)
{
var group = groups.First(kvp => kvp.Value.Contains(number));
groups[group.Key].UnionWith(groups[groupId]);
groups[groupId] = new HashSet<int>();
}
// otherwise just add the new number
else
{
groups[groupId].Add(number);
}
From what I gather you want to iteratively assign numbers to groups satisfying these conditions:
Each number can be contained in only one of the groups
Groups are sets (numbers can occur only once in given group)
If number n exists in group g and we try to add it to group g', all numbers from g' should be transferred to g instead (avoiding repetitions in g)
Although approaches utilizing Dictionary<int, HashSet<int>> are correct, here's another one (more mathematically based).
You could simply maintain a Dictionary<int, int>, in which the key would be the number, and the corresponding value would indicate the group, to which that number belongs (this stems from condition 1.). And here's the add routine:
//let's assume dict is a reference to the dictionary
//k is a number, and g is a group
void AddNumber(int k, int g)
{
//if k already has assigned a group, we assign all numbers from g
//to k's group (which should be O(n))
if(dict.ContainsKey(k) && dict[k] != g)
{
foreach(var keyValuePair in dict.Where(kvp => kvp.Value == g).ToList())
dict[keyValuePair.Key] = dict[k];
}
//otherwise simply assign number k to group g (which should be O(1))
else
{
dict[k] = g;
}
}
Notice that from a mathematical point of view what you want to model is a function from a set of numbers to a set of groups.
I have kept it as easy to follow as I can, trying not to impact the speed or deviate from the spec.
Create a class called Groups.cs and copy and paste this code into it:
using System;
using System.Collections.Generic;
namespace XXXNAMESPACEXXX
{
public static class Groups
{
public static List<List<int>> group { get; set; }
public static int active { get; set; }
public static void AddNumberToGroup(int numberToAdd, int groupToAddItTo)
{
try
{
if (group == null)
{
group = new List<List<int>>();
}
while (group.Count < groupToAddItTo)
{
group.Add(new List<int>());
}
int IndexOfListToRefresh = -1;
List<int> NumbersToMove = new List<int>();
foreach (List<int> Numbers in group)
{
if (Numbers.Contains(numberToAdd) && (group.IndexOf(Numbers) + 1) != groupToAddItTo)
{
active = group.IndexOf(Numbers) + 1;
IndexOfListToRefresh = group.IndexOf(Numbers);
foreach (int Number in Numbers)
{
NumbersToMove.Add(Number);
}
}
}
foreach (int Number in NumbersToMove)
{
if (!group[groupToAddItTo - 1].Contains(Number))
{
group[groupToAddItTo - 1].Add(Number);
}
}
if (!group[groupToAddItTo - 1].Contains(numberToAdd))
{
group[groupToAddItTo - 1].Add(numberToAdd);
}
if (IndexOfListToRefresh != -1)
{
group[IndexOfListToRefresh] = new List<int>();
}
}
catch//(Exception ex)
{
//Exception handling here
}
}
public static string GetString()
{
string MethodResult = "";
try
{
string Working = "";
bool FirstPass = true;
foreach (List<int> Numbers in group)
{
if (!FirstPass)
{
Working += "\r\n";
}
else
{
FirstPass = false;
}
Working += "group[" + (group.IndexOf(Numbers) + 1) + "]{";
bool InnerFirstPass = true;
foreach (int Number in Numbers)
{
if (!InnerFirstPass)
{
Working += ", ";
}
else
{
InnerFirstPass = false;
}
Working += Number.ToString();
}
Working += "};";
if ((active - 1) == group.IndexOf(Numbers))
{
Working += " //<active>";
}
}
MethodResult = Working;
}
catch//(Exception ex)
{
//Exception handling here
}
return MethodResult;
}
}
}
I don't know if foreach is more or less efficient than standard for loops, so I have made an alternative version that uses standard for loops:
using System;
using System.Collections.Generic;
namespace XXXNAMESPACEXXX
{
public static class Groups
{
public static List<List<int>> group { get; set; }
public static int active { get; set; }
public static void AddNumberToGroup(int numberToAdd, int groupToAddItTo)
{
try
{
if (group == null)
{
group = new List<List<int>>();
}
while (group.Count < groupToAddItTo)
{
group.Add(new List<int>());
}
int IndexOfListToRefresh = -1;
List<int> NumbersToMove = new List<int>();
for(int i = 0; i < group.Count; i++)
{
List<int> Numbers = group[i];
int IndexOfNumbers = group.IndexOf(Numbers) + 1;
if (Numbers.Contains(numberToAdd) && IndexOfNumbers != groupToAddItTo)
{
active = IndexOfNumbers;
IndexOfListToRefresh = IndexOfNumbers - 1;
for (int j = 0; j < Numbers.Count; j++)
{
int Number = NumbersToMove[j];
NumbersToMove.Add(Number);
}
}
}
for(int i = 0; i < NumbersToMove.Count; i++)
{
int Number = NumbersToMove[i];
if (!group[groupToAddItTo - 1].Contains(Number))
{
group[groupToAddItTo - 1].Add(Number);
}
}
if (!group[groupToAddItTo - 1].Contains(numberToAdd))
{
group[groupToAddItTo - 1].Add(numberToAdd);
}
if (IndexOfListToRefresh != -1)
{
group[IndexOfListToRefresh] = new List<int>();
}
}
catch//(Exception ex)
{
//Exception handling here
}
}
public static string GetString()
{
string MethodResult = "";
try
{
string Working = "";
bool FirstPass = true;
for(int i = 0; i < group.Count; i++)
{
List<int> Numbers = group[i];
if (!FirstPass)
{
Working += "\r\n";
}
else
{
FirstPass = false;
}
Working += "group[" + (group.IndexOf(Numbers) + 1) + "]{";
bool InnerFirstPass = true;
for(int j = 0; j < Numbers.Count; j++)
{
int Number = Numbers[j];
if (!InnerFirstPass)
{
Working += ", ";
}
else
{
InnerFirstPass = false;
}
Working += Number.ToString();
}
Working += "};";
if ((active - 1) == group.IndexOf(Numbers))
{
Working += " //<active>";
}
}
MethodResult = Working;
}
catch//(Exception ex)
{
//Exception handling here
}
return MethodResult;
}
}
}
Both implimentations contain the group variable and two methods, which are; AddNumberToGroup and GetString, where GetString is used to check the current status of the group variable.
Note: You'll need to replace XXXNAMESPACEXXX with the Namespace of your project. Hint: Take this from another class.
When adding an item to your List, do this:
int NumberToAdd = 10;
int GroupToAddItTo = 2;
AddNumberToGroup(NumberToAdd, GroupToAddItTo);
...or...
AddNumberToGroup(10, 2);
In the example above, I am adding the number 10 to group 2.
Test the speed with the following:
DateTime StartTime = DateTime.Now;
int NumberOfTimesToRepeatTest = 1000;
for (int i = 0; i < NumberOfTimesToRepeatTest; i++)
{
Groups.AddNumberToGroup(4, 1);
Groups.AddNumberToGroup(3, 1);
Groups.AddNumberToGroup(8, 2);
Groups.AddNumberToGroup(5, 2);
Groups.AddNumberToGroup(7, 3);
Groups.AddNumberToGroup(3, 3);
Groups.AddNumberToGroup(8, 4);
Groups.AddNumberToGroup(43, 4);
Groups.AddNumberToGroup(100, 5);
Groups.AddNumberToGroup(1, 5);
Groups.AddNumberToGroup(5, 6);
Groups.AddNumberToGroup(78, 6);
Groups.AddNumberToGroup(34, 7);
Groups.AddNumberToGroup(456, 7);
Groups.AddNumberToGroup(456, 8);
Groups.AddNumberToGroup(7, 8);
Groups.AddNumberToGroup(7, 9);
}
long MillisecondsTaken = DateTime.Now.Ticks - StartTime.Ticks;
Console.WriteLine(Groups.GetString());
Console.WriteLine("Process took: " + MillisecondsTaken);
I think this is what you need. Let me know if I misunderstood anything in the question.
As far as I can tell it's brilliant, it's fast and it's tested.
Enjoy!
...and one more thing:
For the little windows interface app, I just created a simple winforms app with three textboxes (one set to multiline) and a button.
Then, after adding the Groups class above, in the button-click event I wrote the following:
private void BtnAdd_Click(object sender, EventArgs e)
{
try
{
int Group = int.Parse(TxtGroup.Text);
int Number = int.Parse(TxtNumber.Text);
Groups.AddNumberToGroup(Number, Group);
TxtOutput.Text = Groups.GetString();
}
catch//(Exception ex)
{
//Exception handling here
}
}

Split a range sequence into multiple string c#,linq [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Not sure why question is being marked as offtopic, where as so called desired behaviour is included within the question post!
I am trying to write this program that takes two inputs:
• a set of include intervals
• and a set of exclude intervals
The sets of intervals can be given in any order, and they may be empty or overlapping. The program should output the result of taking all the includes and “remove” the excludes. The output should be given as non-overlapping intervals in a sorted order.
Intervals will contain Integers only
Example :
Includes: 50-600, 10-100
Excludes: (empty)
Output: 10-600
Includes: 10-100, 200-300, 400-600
Excludes: 95-205, 410-420
Output: 10-94, 206-300, 400-409, 421-600
I tried to populate two Enumerable Range from include and excludes (after splitting,parsing ), but didn't find any efficient way of implementing this afterwards.
string[] _break = _string.Split(',');
string[] _breakB = _stringB.Split(',');
string[] res = new string[_break.Length + 1];
string[] _items, _itemsB;
List < int > _back = new List < int > ();
int count = 0;
foreach(var _item in _break) {
_items = _item.Split('-');
var a = Enumerable.Range(int.Parse(_items[0]), (int.Parse(_items[1]) - int.Parse(_items[0]) + 1)).ToList();
foreach(var _itemB in _breakB) {
_itemsB = _itemB.Split('-');
var b = Enumerable.Range(int.Parse((_itemsB[0])), (int.Parse(_itemsB[1]) - int.Parse((_itemsB[0])) + 1)).ToList();
var c = a.Except < int > (b).ToList();
/// different things tried here, but they are not good
res[count] = c.Min().ToString() + "-" + c.Max().ToString();
count++;
}
}
return res;
Any input will be of great help
You can use the Built-in SortedSet<T> collection to do most of the work for you like this:
The SortedSet<T> collection implements the useful UnionWith and ExceptWith methods which at least makes the code quite easy to follow:
private void button1_Click(object sender, EventArgs e)
{
string[] includeRanges = _string.Text.Replace(" ", "").Split(',');
string[] excludeRanges = _stringB.Text.Replace(" ", "").Split(',');
string[] includeRange, excludeRange;
SortedSet<int> includeSet = new SortedSet<int>();
SortedSet<int> excludeSet = new SortedSet<int>();
// Create a UNION of all the include ranges
foreach (string item in includeRanges)
{
includeRange = item.Split('-');
includeSet.UnionWith(Enumerable.Range(int.Parse(includeRange[0]), (int.Parse(includeRange[1]) - int.Parse(includeRange[0]) + 1)).ToList());
}
// Create a UNION of all the exclude ranges
foreach (string item in excludeRanges)
{
excludeRange = item.Split('-');
excludeSet.UnionWith(Enumerable.Range(int.Parse(excludeRange[0]), (int.Parse(excludeRange[1]) - int.Parse(excludeRange[0]) + 1)).ToList());
}
// Exclude the excludeSet from the includeSet
includeSet.ExceptWith(excludeSet);
//Format the output using a stringbuilder
StringBuilder sb = new StringBuilder();
int lastValue = -1;
foreach (int included in includeSet)
{
if (lastValue == -1)
{
sb.Append(included + "-");
lastValue = included;
}
else
{
if (lastValue == included - 1)
{
lastValue = included;
}
else
{
sb.Append(lastValue + ",");
sb.Append(included + "-");
lastValue = included;
}
}
}
sb.Append(lastValue);
result.Text = sb.ToString();
}
This should work faster than SortedSet trick, at least for large intervals. Idea is like:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace Test
{
using Pair = Tuple<int, int>; //for brevity
struct Point //point of an interval
{
public enum Border { Left, Right };
public enum Interval { Including, Excluding };
public int Val;
public int Brdr;
public int Intr;
public Point(int value, Border border, Interval interval)
{
Val = value;
Brdr = (border == Border.Left) ? 1 : -1;
Intr = (int)interval;
}
public override string ToString() =>
(Brdr == 1 ? "L" : "R") + (Intr == 0 ? "+ " : "- ") + Val;
}
class Program
{
static IEnumerable<Pair> GetInterval(string strIn, string strEx)
{
//a func to get interval border points from string:
Func<string, Point.Interval, IEnumerable<Point>> parse = (str, intr) =>
Regex.Matches(str, "[0-9]+").Cast<Match>().Select((s, idx) =>
new Point(int.Parse(s.Value), (Point.Border)(idx % 2), intr));
var INs = parse(strIn, Point.Interval.Including);
var EXs = parse(strEx, Point.Interval.Excluding);
var intrs = new int[2]; //current interval border control IN[0], EX[1]
int start = 0; //left border of a new resulting interval
//put all points in a line and loop:
foreach (var p in INs.Union(EXs).OrderBy(x => x.Val))
{
//check for start (close) of a new (cur) interval:
var change = (intrs[p.Intr] == 0) ^ (intrs[p.Intr] + p.Brdr == 0);
intrs[p.Intr] += p.Brdr;
if (!change) continue;
var In = p.Intr == 0 && intrs[1] == 0; //w no Ex
var Ex = p.Intr == 1 && intrs[0] > 0; //breaks In
var Open = intrs[p.Intr] > 0;
var Close = !Open;
if (In && Open || Ex && Close)
{
start = p.Val + p.Intr; //exclude point if Ex
}
else if (In && Close || Ex && Open)
{
yield return new Pair(start, p.Val - p.Intr);
}
}
}
static void Main(string[] args)
{
var strIN = "10-100, 200-300, 400-500, 420-480";
var strEX = "95-205, 410-420";
foreach (var i in GetInterval(strIN, strEX))
Console.WriteLine(i.Item1 + "-" + i.Item2);
Console.ReadLine();
}
}
}
So, you task could be separated to the list of subtasks:
Parse a source line of intervals to the list of objects
Concatinate intervals if they cross each over
Excludes intervals 'excludes' from 'includes'
I published my result code here: http://rextester.com/OBXQ56769
The code could be optimized as well, but I wanted it to be quite simple. Hope it will help you.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication
{
public class Program
{
private const string Includes = "10-100, 200-300, 400-500 ";
private const string Excludes = "95-205, 410-420";
private const string Pattern = #"(\d*)-(\d*)";
public static void Main(string[] args)
{
var includes = ParseIntevals(Includes);
var excludes = ParseIntevals(Excludes);
includes = ConcatinateIntervals(includes);
excludes = ConcatinateIntervals(excludes);
// The Result
var result = ExcludeFromInclude(includes, excludes);
foreach (var interval in result)
{
Console.WriteLine(interval.Min + "-" + interval.Max);
}
}
/// <summary>
/// Excludes intervals 'excludes' from 'includes'
/// </summary>
public static List<Interval> ExcludeFromInclude(List<Interval> includes, List<Interval> excludes)
{
var result = new List<Interval>();
if (!excludes.Any())
{
return includes.Select(x => x.Clone()).ToList();
}
for (int i = 0; i < includes.Count; i++)
{
for (int j = 0; j < excludes.Count; j++)
{
if (includes[i].Max < excludes[j].Min || includes[i].Min > excludes[j].Max)
continue; // no crossing
//1 Example: includes[i]=(10-20) excludes[j]=(15-25)
if (includes[i].Min < excludes[j].Min && includes[i].Max <= excludes[j].Max)
{
var interval = new Interval(includes[i].Min, excludes[j].Min - 1);
result.Add(interval);
break;
}
//2 Example: includes[i]=(10-25) excludes[j]=(15-20)
if (includes[i].Min <= excludes[j].Min && includes[i].Max >= excludes[j].Max)
{
if (includes[i].Min < excludes[j].Min)
{
var interval1 = new Interval(includes[i].Min, excludes[j].Min - 1);
result.Add(interval1);
}
if (includes[i].Max > excludes[j].Max)
{
var interval2 = new Interval(excludes[j].Max + 1, includes[i].Max);
result.Add(interval2);
}
break;
}
//3 Example: includes[i]=(15-25) excludes[j]=(10-20)
if (includes[i].Min < excludes[j].Max && includes[i].Max > excludes[j].Max)
{
var interval = new Interval(excludes[j].Max + 1, includes[i].Max);
result.Add(interval);
break;
}
}
}
return result;
}
/// <summary>
/// Concatinates intervals if they cross each over
/// </summary>
public static List<Interval> ConcatinateIntervals(List<Interval> intervals)
{
var result = new List<Interval>();
for (int i = 0; i < intervals.Count; i++)
{
for (int j = 0; j < intervals.Count; j++)
{
if (i == j)
continue;
if (intervals[i].Max < intervals[j].Min || intervals[i].Min > intervals[j].Max)
{
Interval interval = intervals[i].Clone();
result.Add(interval);
continue; // no crossing
}
//1
if (intervals[i].Min < intervals[j].Min && intervals[i].Max < intervals[j].Max)
{
var interval = new Interval(intervals[i].Min, intervals[j].Max);
result.Add(interval);
break;
}
//2
if (intervals[i].Min < intervals[j].Min && intervals[i].Max > intervals[j].Max)
{
Interval interval = intervals[i].Clone();
result.Add(interval);
break;
}
//3
if (intervals[i].Min < intervals[j].Max && intervals[i].Max > intervals[j].Max)
{
var interval = new Interval(intervals[j].Min, intervals[i].Max);
result.Add(interval);
break;
}
//4
if (intervals[i].Min > intervals[j].Min && intervals[i].Max < intervals[j].Max)
{
var interval = new Interval(intervals[j].Min, intervals[j].Max);
result.Add(interval);
break;
}
}
}
return result.Distinct().ToList();
}
/// <summary>
/// Parses a source line of intervals to the list of objects
/// </summary>
public static List<Interval> ParseIntevals(string intervals)
{
var matches = Regex.Matches(intervals, Pattern, RegexOptions.IgnoreCase);
var list = new List<Interval>();
foreach (Match match in matches)
{
var min = int.Parse(match.Groups[1].Value);
var max = int.Parse(match.Groups[2].Value);
list.Add(new Interval(min, max));
}
return list.OrderBy(x => x.Min).ToList();
}
/// <summary>
/// Interval
/// </summary>
public class Interval
{
public int Min { get; set; }
public int Max { get; set; }
public Interval()
{
}
public Interval(int min, int max)
{
Min = min;
Max = max;
}
public override bool Equals(object obj)
{
var obj2 = obj as Interval;
if (obj2 == null) return false;
return obj2.Min == Min && obj2.Max == Max;
}
public override int GetHashCode()
{
return this.ToString().GetHashCode();
}
public override string ToString()
{
return string.Format("{0}-{1}", Min, Max);
}
public Interval Clone()
{
return (Interval) this.MemberwiseClone();
}
}
}
}
Lots of ways to solve this. The LINQ approach hasn't been discussed yet - this is how I would do it:
// declaring a lambda fn because it's gonna be used by both include/exclude
// list
Func<string, IEnumerable<int>> rangeFn =
baseInput =>
{
return baseInput.Split (new []{ ',', ' ' },
StringSplitOptions.RemoveEmptyEntries)
.SelectMany (rng =>
{
var range = rng.Split (new []{ '-' },
StringSplitOptions.RemoveEmptyEntries)
.Select(i => Convert.ToInt32(i));
// just in case someone types in
// a reverse range (e.g. 10-5), LOL...
var start = range.Min ();
var end = range.Max ();
return Enumerable.Range (start, (end - start + 1));
});
};
var includes = rangeFn (_string);
var excludes = rangeFn (_stringB);
var result = includes.Except (excludes).Distinct().OrderBy(r => r);

fastest way to compare string elements with each other

I have a list with a lot of strings (>5000) where I have to take the first element and compare it to all following elements. Eg. consider this list of string:
{ one, two, three, four, five, six, seven, eight, nine, ten }. Now I take one and compare it with two, three, four, ... afterwards I take two and compare it with three, four, ...
I believe the lookup is the problem why this takes so long. On a normal hdd (7200rpm) it takes about 30 seconds, on a ssd 10 seconds. I just don't know how I can speed this up even more. All strings inside the list are ordered by ascending and it is important to check them in this order. If it can speed things up considerably I would not mind to have an unordered list.
I took a look into hashset but I need the checking order so that would not work even with the fast contain method.
EDIT: As it looks like I am not clear enough and as wanted by Dusan here is the complete code. My problem case: I have a lot of directories, with similar names and am getting a collection with all directory names only and comparing them with each other for similarity and writing that. Hence the comparison between hdd and ssd. But that is weird because I am not writing instantly, instead putting it in a field and writing in the end. Still there is a difference in speed.
Why did I not include the whole code? Because I believe my core issue here is the lookup of value from the list and the comparison between the 2 strings. Everything else should already be sufficiently fast, adding to list, looking in the blacklist (hashset) and getting a list of dir names.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Threading;
namespace Similarity
{
/// <summary>
/// Credit http://www.dotnetperls.com/levenshtein
/// Contains approximate string matching
/// </summary>
internal static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
internal class Program
{
#region Properties
private static HashSet<string> _blackList = new HashSet<string>();
public static HashSet<string> blackList
{
get
{
return _blackList;
}
}
public static void AddBlackListEntry(string line)
{
blackList.Add(line);
}
private static List<string> _similar = new List<string>();
public static List<string> similar
{
get
{
return _similar;
}
}
public static void AddSimilarEntry(string s)
{
similar.Add(s);
}
#endregion Properties
private static void Main(string[] args)
{
Clean();
var directories = Directory.EnumerateDirectories(Directory.GetCurrentDirectory(), "*", SearchOption.TopDirectoryOnly)
.Select(x => new DirectoryInfo(x).Name).OrderBy(y => new DirectoryInfo(y).Name).ToList();
using (StreamWriter sw = new StreamWriter(#"result.txt"))
{
foreach (var item in directories)
{
Console.WriteLine(item);
sw.WriteLine(item);
}
Console.WriteLine("Amount of directories: " + directories.Count());
}
if (directories.Count != 0)
{
StartSimilarityCheck(directories);
}
else
{
Console.WriteLine("No directories");
}
WriteResult(similar);
Console.WriteLine("Finish. Press any key to exit...");
Console.ReadKey();
}
private static void StartSimilarityCheck(List<string> whiteList)
{
int counter = 0; // how many did we check yet?
var watch = Stopwatch.StartNew();
foreach (var dirName in whiteList)
{
bool insertDirName = true;
if (!IsBlackList(dirName))
{
// start the next element
for (int i = counter + 1; i <= whiteList.Count; i++)
{
// end of index reached
if (i == whiteList.Count)
{
break;
}
int similiariy = LevenshteinDistance.Compute(dirName, whiteList[i]);
// low score means high similarity
if (similiariy < 2)
{
if (insertDirName)
{
//Writer(dirName);
AddSimilarEntry(dirName);
insertDirName = false;
}
//Writer(whiteList[i]);
AddSimilarEntry(dirName);
AddBlackListEntry(whiteList[i]);
}
}
}
Console.WriteLine(counter);
//Console.WriteLine("Skip: {0}", dirName);
counter++;
}
watch.Stop();
Console.WriteLine("Time: " + watch.ElapsedMilliseconds / 1000);
}
private static void WriteResult(List<string> list)
{
using (StreamWriter sw = new StreamWriter(#"similar.txt", true, Encoding.UTF8, 65536))
{
foreach (var item in list)
{
sw.WriteLine(item);
}
}
}
private static void Clean()
{
// yeah hardcoded file names incoming. Better than global variables??
try
{
if (File.Exists(#"similar.txt"))
{
File.Delete(#"similar.txt");
}
if (File.Exists(#"result.txt"))
{
File.Delete(#"result.txt");
}
}
catch (Exception)
{
throw;
}
}
private static void Writer(string s)
{
using (StreamWriter sw = new StreamWriter(#"similar.txt", true, Encoding.UTF8, 65536))
{
sw.WriteLine(s);
}
}
private static bool IsBlackList(string name)
{
return blackList.Contains(name);
}
}
To fix the bottleneck which is the second for-loop insert an if-condition which checks if similiariy is >= than what we want, if that is the case then break the loop. now it runs in 1 second. thanks everyone
Your inner loop uses a strange double check. This may prevent an important JIT optimization, removal of redundant range checks.
//foreach (var item myList)
for (int j = 0; j < myList.Count-1; j++)
{
string item1 = myList[j];
for (int i = j + 1; i < myList.Count; i++)
{
string item2 = myList[i];
// if (i == myList.Count)
...
}
}
The amount of downvotes is crazy but oh well... I found the reason for my performance issue / bottleneck thanks to the comments.
The second for loop inside StartSimilarityCheck() iterates over all entries, which in itself is no problem but when viewed under performance issues and efficient, is bad. The solution is to only check strings which are in the neighborhood but how do we know if they are?
First, we get a list which is ordered by ascension. That gives us a rough overview of similar strings. Now we define a threshold of Levenshtein score (smaller score is higher similarity between two strings). If the score is higher than the threshold it means they are not too similar, thus we can break out of the inner loop. That saves time and the program can finish really fast. Notice that that way is not bullet proof, IMHO because if the first string is 0Directory it will be at the beginning part of the list but a string like zDirectory will be further down and could be missed. Correct me if I am wrong..
private static void StartSimilarityCheck(List<string> whiteList)
{
var watch = Stopwatch.StartNew();
for (int j = 0; j < whiteList.Count - 1; j++)
{
string dirName = whiteList[j];
bool insertDirName = true;
int threshold = 2;
if (!IsBlackList(dirName))
{
// start the next element
for (int i = j + 1; i < whiteList.Count; i++)
{
// end of index reached
if (i == whiteList.Count)
{
break;
}
int similiarity = LevenshteinDistance.Compute(dirName, whiteList[i]);
if (similiarity >= threshold)
{
break;
}
// low score means high similarity
if (similiarity <= threshold)
{
if (insertDirName)
{
AddSimilarEntry(dirName);
AddSimilarEntry(whiteList[i]);
AddBlackListEntry(whiteList[i]);
insertDirName = false;
}
else
{
AddBlackListEntry(whiteList[i]);
}
}
}
}
Console.WriteLine(j);
}
watch.Stop();
Console.WriteLine("Ms: " + watch.ElapsedMilliseconds);
Console.WriteLine("Similar entries: " + similar.Count);
}

Auto Generate alphanumeric Unique Id with C#

Total string length is 5 chars
I have a scenario, ID starts with
A0001 and ends with A9999 then
B0001 to B9999 until F0001 to f9999
after that
FA001 to FA999 then
FB001 to FB999 until ....FFFF9
Please suggest any idea on how to create this format.
public static IEnumerable<string> Numbers()
{
return Enumerable.Range(0xA0000, 0xFFFF9 - 0xA0000 + 1)
.Select(x => x.ToString("X"));
}
You could also have an id generator class:
public class IdGenerator
{
private const int Min = 0xA0000;
private const int Max = 0xFFFF9;
private int _value = Min - 1;
public string NextId()
{
if (_value < Max)
{
_value++;
}
else
{
_value = Min;
}
return _value.ToString("X");
}
}
I am a few years late. But I hope my answer will help everyone looking for a good ID Generator. None of the previous answers work as expected and do not answer this question.
My answer fits the requirements perfectly. And more!!!
Notice that setting the _fixedLength to ZERO will create dynamically sized ID's.
Setting it to anything else will create FIXED LENGTH ID's;
Notice also that calling the overload that takes a current ID will "seed" the class and consecutive calls DO NOT need to be called with another ID. Unless you had random ID's and need the next one on each.
Enjoy!
public static class IDGenerator
{
private static readonly char _minChar = 'A';
private static readonly char _maxChar = 'C';
private static readonly int _minDigit = 1;
private static readonly int _maxDigit = 5;
private static int _fixedLength = 5;//zero means variable length
private static int _currentDigit = 1;
private static string _currentBase = "A";
public static string NextID()
{
if(_currentBase[_currentBase.Length - 1] <= _maxChar)
{
if(_currentDigit <= _maxDigit)
{
var result = string.Empty;
if(_fixedLength > 0)
{
var prefixZeroCount = _fixedLength - _currentBase.Length;
if(prefixZeroCount < _currentDigit.ToString().Length)
throw new InvalidOperationException("The maximum length possible has been exeeded.");
result = result = _currentBase + _currentDigit.ToString("D" + prefixZeroCount.ToString());
}
else
{
result = _currentBase + _currentDigit.ToString();
}
_currentDigit++;
return result;
}
else
{
_currentDigit = _minDigit;
if(_currentBase[_currentBase.Length - 1] == _maxChar)
{
_currentBase = _currentBase.Remove(_currentBase.Length - 1) + _minChar;
_currentBase += _minChar.ToString();
}
else
{
var newChar = _currentBase[_currentBase.Length - 1];
newChar++;
_currentBase = _currentBase.Remove(_currentBase.Length - 1) + newChar.ToString();
}
return NextID();
}
}
else
{
_currentDigit = _minDigit;
_currentBase += _minChar.ToString();
return NextID();
}
}
public static string NextID(string currentId)
{
if(string.IsNullOrWhiteSpace(currentId))
return NextID();
var charCount = currentId.Length;
var indexFound = -1;
for(int i = 0; i < charCount; i++)
{
if(!char.IsNumber(currentId[i]))
continue;
indexFound = i;
break;
}
if(indexFound > -1)
{
_currentBase = currentId.Substring(0, indexFound);
_currentDigit = int.Parse(currentId.Substring(indexFound)) + 1;
}
return NextID();
}
}
This is a sample of the ouput using _fixedLength = 4 and _maxDigit = 5
A001
A002
A003
A004
A005
B001
B002
B003
B004
B005
C001
C002
C003
C004
C005
AA01
AA02
AA03
AA04
AA05
AB01
AB02
AB03
AB04
AB05
AC01
AC02
AC03
AC04
AC05
see this code
private void button1_Click(object sender, EventArgs e)
{
string get = label1.Text.Substring(7); //label1.text=ATHCUS-100
MessageBox.Show(get);
string ou="ATHCUS-"+(Convert.ToInt32(get)+1).ToString();
label1.Text = ou.ToString();
}
Run this query in order to get the last ID in the database
SELECT TOP 1 [ID_COLUMN] FROM [NAME_OF_TABLE] ORDER BY [ID_COLUMN] DESC
Read the result to a variable and then run the following function on the result in order to get the next ID.
public string NextID(string lastID)
{
var allLetters = new string[] {"A", "B", "C", "D", "E", "F"};
var lastLetter = lastID.Substring(0, 1);
var lastNumber = int.Parse(lastID.Substring(1));
if (Array.IndexOf(allLetters, lastLetter) < allLetters.Length - 1 &&
lastNumber == 9999)
{
//increase the letter
lastLetter = allLetters(Array.IndexOf(allLetters, lastLetter) + 1);
lastNumber = 0;
} else {
lastLetter = "!";
}
var result = lastLetter + (lastNumber + 1).ToString("0000");
//ensure we haven't exceeded the upper limit
if (result.SubString(0, 1) == "!") {
result = "Upper Bounds Exceeded!";
}
return result;
}
DISCLAIMER
This code will only generate the first set of IDs. I do not understand the process of generating the second set.
If you need to take it from the database and do this you can use something like the following.
int dbid = /* get id from db */
string id = dbid.ToString("X5");
This should give you the format you are looking for as a direct convert from the DB ID.

Categories