How to pick items from list of lists - c#

I need a bit of help. I have a list of 5k elements that is a path to files. I split the list into five smaller lists. My problem is how can I loop over the list of lists and pick the elements from all five at the same iteration.
Sample of code.
The source list:
List<string> sourceDir = new List<string>(new string[] { "C:/Temp/data/a.txt", "C:/Temp/data/s.txt", "C:/Temp/data/d.txt", "C:/Temp/data/f.txt", "C:/Temp/data/g.txt", "C:/Temp/data/h.txt", "C:/Temp/data/j.txt", "C:/Temp/data/k.txt", "C:/Temp/data/l.txt", "C:/Temp/data/z.txt"});
Splitting the list into smaller list:
public static List<List<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 2)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
Result:
var list = Split(sourceDir);
As result in the variable list, I get five lists. How can I now access items from all lists at one iteration for further processing?
Something like:
foreach (string fileName in list[0])
{
foreach (string fileName1 in list[1])
{
foreach (string fileName2 in list[2])
{
foreach (string fileName3 in list[3])
{
foreach (string fileName4 in list[4])
{
//have items from all lists
Console.WriteLine("First name is: " + fileName);
Console.WriteLine("Second name is: " + fileName1);
Console.WriteLine("Third name is: " + fileName2);
Console.WriteLine("Fourth name is: " + fileName3);
Console.WriteLine("Fift name is: " + fileName4);
break;
}
break;
}
break;
}
break;
}
continue;
}
the above foreach loop is just to get an idea of what I need.

Multithreading on file IO operations is not always a good choice. You add the overhead for thread switches, but the disk access is bound to other considerations. Look here for example
Does multithreading make sense for IO-bound operations?
However, just to answer your question, you can use a standard for loop instead of all those foreach, the only thing you need to take care is the case when the sublists don't have the same number of elements. (files number not exactly divisible by 5)
int maxIndex = Math.Max(list[0].Count,
Math.Max(list[1].Count,
Math.Max(list[2].Count,
Math.Max(list[3].Count, list[4].Count))));
for (int x = 0; x < maxIndex; x++)
{
string item0 = x < list[0].Count ? list[0][x] : "No item";
string item1 = x < list[1].Count ? list[1][x] : "No item";
string item2 = x < list[2].Count ? list[2][x] : "No item";
string item3 = x < list[3].Count ? list[3][x] : "No item";
string item4 = x < list[4].Count ? list[4][x] : "No item";
Console.WriteLine("First name is: " + item0);
Console.WriteLine("Second name is: " + item1);
Console.WriteLine("Third name is: " + item2);
Console.WriteLine("Fourth name is: " + item3);
Console.WriteLine("Fifth name is: " + item4);
}

Related

Merging two items into one inside a List<string> in C#

I have a List<string> with the name data with this items:
{ A101, Plans, A102, Elev/Sec, A103, Unnamed }
foreach (string item in data)
{
data1.Add(item + item);
}
I wanted the output to be like this:
A101 Plans
A102 Eleve/Sec
A103 Unnamed
But the output is:
A101 A101
Plans Plans
A102 A102
Eleve/Sec Eleve/Sec
A103 A103
Unnamed Unnamed
How can I fix this problem?
It's not easy to do with a foreach; you'd have to remember the previous element and put the combination into a list when the memory is not blank, then blank the memory
It's easier with a straight for that goes in jumps of two
for(int i=0; i < list.Length; i+=2){
Console.WriteLine(list[i] + " " + list[i+1]);
}
With a foreach it's like:
string x = null;
foreach(var item in list){
if(x == null){
x = item;
} else {
Console.WriteLine(x + " " + item);
x = null;
}
}
x flipflops between being null and something. When it is null, the item is remembered, when it is something the output is the "previous item i.e. x plus the current item"

Linq Or IEnumerable taking Long to run when using Parallel.ForEach [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have an application that reads a csv (200 mb).
var lines = File.ReadLines(file).ToList();
The csv stores pricing information and has around 500k records in it.
The code snippet below when calling StoreValues takes around 18 seconds.
Is there a way to speed this up ?
distinctMarketIds = 54 int values
The lines collection will have 500k lines and each line [0] has marketId which im matching.
IEnumerable<string[]> newList = (lines.Where(t => distinctMarketIds.Contains(t.Split(',')[0]))
.Select(t => t.Split(',')));
log.Info(("Time Taken To Get Filtered Prices " + elapsed.Elapsed.TotalSeconds +" seconds."));
StoreValues(newList, file); //store the prices
log.Info(("Time Taken To Store Prices " + elapsed.Elapsed.TotalSeconds + " seconds."));
The Store value Method uses Parallel.ForEach
Parallel.ForEach(finalLines, new ParallelOptions { MaxDegreeOfParallelism = MaxThreads }, (line) =>
{
});
I cannot seem to find why it would take 18 seconds to go through this loop.
I have tested on another machine with similar specs and it takes 2.5 seconds for StoreValue Method
#region LoadPriceDataFromCsvFile
public int LoadPriceDataFromCsvFile(string filename, string[] marketIdList, int maxThreads)
{
MaxThreads = maxThreads;
int filteredRows = 0;
string[] files = Directory.GetFiles(filename, "*.csv");
elapsed.Start();
log.InfoFormat("Total Csv files to Scan {0}",files.Length);
Parallel.ForEach(files, new ParallelOptions { MaxDegreeOfParallelism = MaxThreads }, (file) =>
{
try
{
log.InfoFormat("About to Scan File {0}", file);
ScanCsvFilesAndGetPrices(file);
}
catch (System.OutOfMemoryException e)
{
log.Info(e);
}
catch (Exception e)
{
log.Info(e);
}
});
return PriceCollection.Count;
}
#endregion
#region ScanCsvFilesAndGetPrices
private void ScanCsvFilesAndGetPrices(string file)
{
try
{
log.Info(("Time Taken " + elapsed.Elapsed.TotalSeconds + " seconds."));
var lines = File.ReadLines(file).ToList();
log.Info(("Time Taken To Read csv " + elapsed.Elapsed.TotalSeconds + " seconds."));
if (lines.Any())
{
log.Info(("Time Taken To Read Any " + elapsed.Elapsed.TotalSeconds + " seconds."));
var firstLine = lines.ElementAt(1); // This is the First Line with Headers
log.Info(("Time Taken To Read First Line " + elapsed.Elapsed.TotalSeconds + " seconds."));
var lastLine = lines.Last(); // This is the Last line in the csv file
log.Info(("Time Taken To Read Last Line " + elapsed.Elapsed.TotalSeconds + " seconds."));
var header = lines.First().Split(',');
log.Info(("Time Taken To Split Header Line " + elapsed.Elapsed.TotalSeconds + " seconds."));
GetIndexOfFields(header);
log.Info(("Time Taken To Read Header " + elapsed.Elapsed.TotalSeconds + " seconds."));
// Get the Publish Date Time
if (PublishedDatetime_Index != -1)
{
var fLine = firstLine.Split(',');
var lLine = lastLine.Split(',');
var firstLineDatetime = (fLine[PublishedDatetime_Index].Contains("+"))? fLine[PublishedDatetime_Index].Remove(fLine[PublishedDatetime_Index].IndexOf("+",StringComparison.Ordinal)): fLine[PublishedDatetime_Index];
var publishDateTimeFirstLine =FileNameGenerator.GetCorrectTime(Convert.ToDateTime(firstLineDatetime));
string lastLineDatetime = (lLine[PublishedDatetime_Index].Contains("+"))? lLine[PublishedDatetime_Index].Remove(lLine[PublishedDatetime_Index].IndexOf("+",StringComparison.Ordinal)): lLine[PublishedDatetime_Index];
var publishDateTimeLastLine =FileNameGenerator.GetCorrectTime(Convert.ToDateTime(lastLineDatetime));
// check if the order execution date time of any order lieas between the date time of first and last line of csv so we can add that csv to our filtered list
string[] distinctMarketIds = OrderEntityColection.FindAll(obj =>obj.OrderLastChangeDateTimeUtc >= publishDateTimeFirstLine &&obj.OrderLastChangeDateTimeUtc <= publishDateTimeLastLine).Select(obj => obj.MarketId.ToString())
.Distinct()
.ToArray();
log.InfoFormat("Total Markets Identified {0}",distinctMarketIds.Length);
List<OrderEntity> foundOrdersList = OrderEntityColection.FindAll(obj =>obj.OrderLastChangeDateTimeUtc >= publishDateTimeFirstLine &&obj.OrderLastChangeDateTimeUtc <= publishDateTimeLastLine);
lock (FoundOrdersList)
{
FoundOrdersList.AddRange(foundOrdersList);
}
log.InfoFormat("Total Orders Identified {0}", FoundOrdersList.Count());
log.Info(("Time Taken To Read Execution Times and Market " + elapsed.Elapsed.TotalSeconds +" seconds."));
if (distinctMarketIds.Length != 0)
{
IEnumerable<string[]> newList =
(lines.Where(
t => distinctMarketIds.Contains(t.Split(',')[0]))
.Select(t => t.Split(','))
);
log.Info(("Time Taken To Get Filtered Prices " + elapsed.Elapsed.TotalSeconds +" seconds."));
// this is taking longer than expected. Somthing to do with IEnumerable<string[]>
StoreValues(newList, file); //store the prices
log.Info(("Time Taken To Store Prices " + elapsed.Elapsed.TotalSeconds + " seconds."));
}
}
}
}
catch (Exception e)
{
log.Info(e);
}
}
#endregion
#region GetIndexOfFields
// These are the fields we want to Look for from the headers and accordingly get their location
private void GetIndexOfFields(IEnumerable<string> lineHeader)
{
int index = 0;
foreach (var column in lineHeader)
{
if (column == "MarketId")
{
MarketId_Index= index;
}
if (column == "Bid")
{
Bid_Index = index; ;
}
if (column == "Ask")
{
Ask_Index = index;
}
if (column == "Mid")
{
Mid_Index = index;
}
if (column == "Is_Indicative")
{
Is_Indicative_Index = index;
}
if (column == "Price_Engine")
{
Price_Engine_Index = index;
}
if (column == "PublishedDatetime")
{
PublishedDatetime_Index = index;
}
if (column == "Market_Is_Open")
{
Market_Is_Open_Index = index;
}
if (column == "AuditId")
{
AuditId_Index = index;
}
if (column == "Row_Update_Version")
{
Row_Update_Version_Index = index;
}
if (column == "DontPublish")
{
DontPublish_Index = index; ;
}
index++;
}
}
#endregion
#region StoreValues
private void StoreValues(IEnumerable<string[]> finalLines, string file)
{
log.InfoFormat("total Finel Lines Sent for Storing {0}", finalLines.Count());
Parallel.ForEach(finalLines, new ParallelOptions { MaxDegreeOfParallelism = MaxThreads }, (line) =>
{
var prices = new Prices();
// the code that you want to measure comes here
var datetime = (line[PublishedDatetime_Index].Contains("+")) ? line[PublishedDatetime_Index].Remove(line[PublishedDatetime_Index].IndexOf("+", StringComparison.Ordinal)) : line[PublishedDatetime_Index];
if (!IsNullOrEmpty(datetime))
{
prices.PublishedDatetime = Convert.ToDateTime(datetime);
}
if (!IsNullOrEmpty(line[MarketId_Index]))
{
prices.MarketId = Convert.ToInt32(line[MarketId_Index]);
}
if (!IsNullOrEmpty(line[Bid_Index]))
{
prices.Bid = Convert.ToDecimal(line[Bid_Index]);
}
if (!IsNullOrEmpty(line[Ask_Index]))
{
prices.Ask = Convert.ToDecimal(line[Ask_Index]);
}
if (!IsNullOrEmpty(line[Mid_Index]))
{
prices.Mid = Convert.ToDecimal(line[Mid_Index]);
}
if (!IsNullOrEmpty(line[Is_Indicative_Index]))
{
prices.Is_Indicative = Convert.ToBoolean(line[Is_Indicative_Index]);
}
else
{
prices.Is_Indicative = false;
}
if (!IsNullOrEmpty(line[Price_Engine_Index]))
{
prices.Price_Engine = Convert.ToString(line[Price_Engine_Index]);
}
if (!IsNullOrEmpty(line[Market_Is_Open_Index]))
{
prices.Market_Is_Open = line[Market_Is_Open_Index] == "1";
}
if (!IsNullOrEmpty(line[AuditId_Index]))
{
prices.AuditId = Convert.ToString(line[AuditId_Index]);
}
if (!IsNullOrEmpty(line[Row_Update_Version_Index]))
{
prices.Row_Update_Version = Convert.ToString(line[Row_Update_Version_Index]);
}
if (!IsNullOrEmpty(line[DontPublish_Index]))
{
if (DontPublish_Index != 0)
{
prices.DontPublish = line[DontPublish_Index] == "1";
}
}
prices.SbProdFile = file;
lock (PriceCollection)
{
PriceCollection.Add(prices);
}
});
}
I don't see how Parallel.ForEach could help to improve performance when you need to process a single file
Don't use File.ReadLines(file).ToList(), either use ReadAllLines if you want all lines in memory or use ReadLines if you want to process the lines one after another
Why do you split the line multiple times?
Use a HashSet<string> for distinctMarketIds:
This should be more efficient:
var marketIdSet = new HashSet<string>(OrderEntityColection.FindAll(obj =>obj.OrderLastChangeDateTimeUtc >= publishDateTimeFirstLine &&obj.OrderLastChangeDateTimeUtc <= publishDateTimeLastLine).Select(obj => obj.MarketId.ToString()));
IEnumerable<string[]> allFields = File.ReadLines(file)
.Select(line => line.Split(','))
.Where(arr => marketIdSet.Contains(arr[0]));
Note that due to deferred execution of Select and Where this is just a query, it is yet not executed. So whenever you will use allFields you will execute this query again. So it's a good idea to create a collection, f.e. with allFields.ToList() which you pass to StoreValues:
StoreValues(allFields.ToList(), file); //store the prices
If you pass a collection you could really benefit from using Parallel.ForEach in StoreValues.
static void Main()
{
//var lines = File.ReadLines(file).ToList();
// this is just a fast generation for sample data
var lines = Enumerable.Range(0, 500000)
.Select(i => string.Join(",", i % 7, i, i & 2))
.ToList();
// HashSet will work as an indexed store and will match faster in your loop
var distinctMarketIds = new HashSet<string>{
"0", "2", "3", "5"
};
// Do this if you are to use the method syntax instead of the query syntax
// var newList = lines.Select(l=>l.Split(','))
// .Where(ps=>distinctMarketIds.Contains(ps[0]));
var newList = from l in lines
// this will parse the string once versus twice as you were doing before
let ps = l.Split(',')
where distinctMarketIds.Contains(ps[0])
select ps;
// can't see the content of your `StoreValues` method but writing to a
// file in parallel will never work as expected.
using (var stream = new StreamWriter("outfile.txt"))
foreach (var l in newList)
stream.WriteLine(string.Join(";", l));
}

Most efficient way to make duplicates unique in collection

I have a collection. And in this collection, if a duplicate is added, I want to append the text " - N" (Where N is an integer that is not used by a current item in the collection).
For Example, if I have the following list:
item1
item2
and try to add 'item1' again, I want the list to end up like so:
item1
item2
item1 - 1
If I try to add 'item1' again, the list will then be:
item1
item2
item1 - 1
item1 - 2
Pretty straight forward. Below is my simple algorithm, but I'm getting a noticeable loss in performance when dealing with 10,000 items. Obviously that's going to happen somewhat, but are there better approaches to this? Couldn't find any similar question asked, so figure I'd see if anyone has ran into a similar issue.
Item copyItem = new Item();
string tempName = name;
int copyNumber = 1;
while(copyItem != null)
{
copyItem = MyCollection.FirstOrDefault(blah => blah.Name == tempName);
if (copyItem == null)
{
name = tempName;
break;
}
tempName = name + " - " + copyNumber;
++copyNumber;
}
I would firstly sort the values - thanks to this you only need to make a check with the previous value and not with the whole collection.
So it could look like this:
List<string> values = new List<string> { "item1", "item1", "item1" };
values.Sort();
string previousValue = string.Empty;
int number = 1;
for(int i = 0 ; i < values.Count; i ++)
{
if (values[i].Equals(previousValue))
{
previousValue = values[i];
values[i] = values[i] + "-" + number;
number++;
}
else
{
previousValue = values[i];
number = 1;
}
}
I would use a Dictionary<string, int> to store the number of duplicates for a particular item. So a helper method would look something like this:
Dictionary<string, int> countDictionary = new Dictionary<string, int>(); // case sensitive!
string GetNameForItem(string itemName)
{
var name = itemName;
var count = 0;
countDictionary.TryGetValue(itemName, out count);
if (count > 0)
name = string.Format("{0} - {1}", itemName, count);
countDictionary[itemName] = count + 1;
return name;
}
Alternatively, you could split up the operation into several methods if you didn't want GetNameForItem to automatically increment on retrieval:
int GetCountForItem(string itemName)
{
var count = 0;
countDictionary.TryGetValue(itemName, out count);
return count;
}
string GetNameForItem(string itemName)
{
var name = itemName;
var count = GetCountForItem(itemName);
if (count > 0)
name = string.Format("{0} - {1}", itemName, count);
return name;
}
int IncrementCountForItem(string itemName)
{
var newCount = GetCountForItem(itemName) + 1;
countDictionary[itemName] = newCount;
return newCount;
}
It is important to note that if you are supporting deletion from the collection, you will have to update the count accordingly:
int DecrementCountForItem(string itemName)
{
var newCount = Math.Max(0, GetCountForItem(itemName) - 1); // Prevent count from going negative!
countDictionary[itemName] = newCount;
return newCount;
}
You will also have to keep in mind what happens if you have two items, say "Item A" and "Item A - 1", then you delete "Item A". Should you rename "Item A - 1" to "Item A"?
Okay so you need an iterator per value and not a global one. This code will do the thing.
// Inputs for Tests purpose
List<string> values = new List<string> { "item1", "item2", "item1", "item1" };
// Result data
List<string> finalResult = new List<string>();
// 1 - Group by item value
var tempResult = from i in values
group i by i;
// We loop over all different item name
foreach (var curItem in tempResult)
{
// Thanks to the group by we know how many item with the same name exists
for (int ite = 0; ite < curItem.Count(); ite++)
{
if (ite == 0)
finalResult.Add(curItem.Key);
else
finalResult.Add(string.Format("{0} - {1}", curItem.Key, ite));
}
}
Thanks to LINQ you can reduce the amount of code, next code will do exactly the same thing and should be also quickier because I use the ToList() method so the LINQ query will not have a deferred execution.
// Inputs for Tests purpose
List<string> values = new List<string> { "item1", "item2", "item1", "item1" };
// Result data
List<string> finalResult = new List<string>();
values.GroupBy<string, string>(s1 => s1).ToList().ForEach(curItem =>
{
for (int ite = 0; ite < curItem.Count(); ite++)
{
finalResult.Add(ite == 0 ? curItem.Key : string.Format("{0} - {1}", curItem.Key, ite));
}
});

Three combination of dimension array in C#?

How do I create three combinations of dimension array in C#?, I am getting error message
index was outside of the bounds of the array.
foreach (XmlNode RegexExpression in XmlDataAccess.GetElementList(RefFile, "//regex"))
{
xRefList.Add(RegexExpression.InnerText);
}
foreach (XmlNode RegexExpression in XmlDataAccess.GetElementList(RefFile, "//word"))
{
WordList.Add(RegexExpression.InnerText);
}
foreach (XmlNode RegexExpression in XmlDataAccess.GetElementList(RefFile, "//title"))
{
TitleList.Add(RegexExpression.InnerText);
}
ArrayList xRefResult = MainDocumentPart_Framework.getReferenceContent(FileName, xRefList);
ArrayList TitleResult = MainDocumentPart_Framework.getReferenceContent(FileName, TitleList);
ArrayList WordResult = MainDocumentPart_Framework.getReferenceContent(FileName, WordList);
var FinalResult = from first in TitleResult.ToArray()
from second in WordList.ToArray()
from third in xRefResult.ToArray()
select new[] { first, second, third };
foreach (var Item in FinalResult)
{
System.Windows.MessageBox.Show(Item.ToString());
//I like to view show, all the combination of arrays
//first1, second1, third1
//first1, second1, third2
//first1, second1, third3 ...........
}
I'm not really sure what kind of output you're after, and I don't think you need to use LINQ for this.
string outputStr = "";
for(int x = 0;x<xRefList.Count;x++)
{
for(int y = 0;y<WordList.Count;y++)
{
for(int z = 0;z<TitleList.Count;z++)
{
outputStr += xRefList[x] + " " + WordList[y] + " " + TitleList[z] + "\n";
}
}
}
MessageBox.Show(outputStr);
Would something like this work?

How to merge values of two lists together

For example I have:
public static List<int> actorList = new List<int>();
public static List<string> ipList = new List<string>();
They both have various items in.
So I tried joining the values (string and int) together using a foreach loop:
foreach (string ip in ipList)
{
foreach (int actor in actorList)
{
string temp = ip + " " + actor;
finalList.Add(temp);
}
}
foreach (string final in finalList)
{
Console.WriteLine(finalList);
}
Although looking back at this, this was pretty stupid and obviously will not work, as the first forloop is nested.
My expected values for finalList list:
actorListItem1 ipListItem1
actorListItem2 ipListItem2
actorListItem3 ipListItem3
and so on..
So the values from the two lists are concatenated with each other - corresponding of their position in the lists order.
Use ZIP function of LINQ
List<string> finalList = actorList.Zip(ipList, (x,y) => x + " " + y).ToList();
finalList.ForEach(x=> Console.WriteLine(x)); // For Displaying
OR combine them in one line
actorList.Zip(ipList,(x,y)=>x+" "+y).ToList().ForEach(x=>Console.WriteLine(x));
What about some functional goodness?
listA.Zip(listB, (a, b) => a + " " + b)
Assuming you can use .NET 4, you want to look at the Zip extension method and the provided example:
int[] numbers = { 1, 2, 3, 4 };
string[] words = { "one", "two", "three" };
// The following example concatenates corresponding elements of the
// two input sequences.
var numbersAndWords = numbers.Zip(words, (first, second) => first + " " + second);
foreach (var item in numbersAndWords)
Console.WriteLine(item);
Console.WriteLine();
In this example, because there is no corresponding entry for "4" in words, it is omitted from the output. You would need to do some checking to make sure the collections are the same length before you start.
Loop over the indexes:
for (int i = 0; i < ipList.Count; ++i)
{
string temp = ipList[i] + " " + actorList[i];
finalList.Add(temp);
}
You may also want to add code before this to verify that the lists are the same length:
if (ipList.Count != actorList.Count)
{
// throw some suitable exception
}
for(int i=0; i<actorList.Count; i++)
{
finalList.Add(actorList[i] + " " + ipList[i]);
}

Categories