HashMap. Create entry (int, string[]) and push into value immediately - c#

I want to get all files of the same size in buckets according, size is a key.
Default behaviour would override the value whenever you associate it with an existing key. I want to push a value to string[] array whenever the same value is met.
1556 - "1.txt" - Entry added to the dictionary, 1.txt put to the string[]
1556 - "7.txt" - 7.txt pushed to the string[] associated with 1556
My current thought is to enumerate once through all files and create entries with keys and empty arrays in the dictionary:
foreach(var file in directory){
map[file.length] = new string[]/List<string>();
then enumerate second time retrieving array associated with current key:
foreach(var file in directory){
map[file.length].push(file.name);
}
Are there any better ways to do this?

As far as I understood, you want entries to consist of fileSize as keys,
fileNames array as value.
In that case, I'd suggest to use just one loop, like so:
foreach(var file in directory)
{
if (!map.ContainsKey(file.Length))
{
map.Add(file.Length, new List<string>());
}
map[file.Length].Add(file.Name);
}
Edit: removed space for each file name.

Related

How to read records from last to top in stack?

Below is my variable of type stack :
var records = Stack<Records>();
Suppose records contains data like below for eg:
3
2
1
5
8
Now I want to loop on this above variable but in reverse order like below:
foreach (var item in records)
{
// item should have 8 then 5 then 1......
}
Note : I don't want to create overhead of converting this in to IEnumerable or List and then reverse loop.
You can use Reverse():
foreach (var item in records.Reverse()) {
...
Additional info for OP:
Stack is implemented as an array.
Stacks and queues are useful when you need temporary storage for
information; that is, when you might want to discard an element after
retrieving its value. Use Queue if you need to access the
information in the same order that it is stored in the collection. Use
System.Collections.Generic.Stack if you need to access the
information in reverse order.

Dictionary runs slow when using ContainsValue()

I have a HashSet containing custom objects generated from reading a binary file. I also have a dictionary generated from reading each row of a DBF file. There's an index property on both that line up with each other. For example, the 10th item in my Dictionary will line up with the 10th item in my HashSet.
I am comparing LARGE amounts of data against each other. There can be anywhere from 10,000 records to 500,000. The application checks the other two files (one binary, the other is a dbf) for differences. It checks the HashCode of the object (which is generated by certain properties, it does this comparison fast and easy)
Here is how I build each individual dictionary (there is a similar one for mod as well):
foreach (DataRow row in origDbfFile.datatable.Rows)
{
string str = "";
foreach (String columnName in columnNames)
{
str += "~" + row.Field<Object>(columnName);
}
origDRdict.Add(d, str);
d++;
}
The columns between the two files will always be the same. However I can run into two different files with different columns. I essentially output all data into a string for dictionary lookup. I only want to hit the DBF file again if the data is different.
Here is my code for DB lookup. This will find differences, it's just really slow when it runs the ELSE section of my (!foundIt) if block. If I remove it, it only takes one minute to list all not found items.
foreach (CustomClass customclass in origCustomClassList) {
Boolean foundIt = false;
if (modCustomClassList.Contains(customclass))
{
foundIt = true;
}
//at this point, an element has not been found
if (!foundIt)
{
notFoundRecords.Add(customclass);
}
//If I remove this entire else block, code runs fast.
else //at this point an element has been found
{
//
//check 'modified' dictionary array
if (!(modDRdict.ContainsValue(origDRdict[i])))
{
//at this point, the coordinates are the same,
//however there are DB changes
//this is where I would do a full check based on indexes
//to show changes.
}
}
i++; //since hashsets can't be indexed, we need to increment
}
What I've tried / Other Thoughts
Generating a HashSet of custom objects, custom object having an index of an integer, and string being the length of columns and values
Removing if (!(modDRdict.ContainsValue(origDRdict[i]))) block makes code significantly quicker. Time to iterate removed records between two 440,000 record files only takes one minute. The dictionary lookup is taking forever!
I don't think the foreach loop within the foreach loop is causing too much overhead. If I keep it in the code, but don't do a lookup then it still runs quick.
Dictionaries are optimized to look up by key, not by value. If you need to look up by value, you're using the wrong dictionary. You'll need to build either a HashSet on your values to quickly check for containment, or build a reverse dictionary if you need the keys.

Checking a File for matches

I am saving each line in a list to the end of a file...
But would I would like to do is check if that file already contains that line so it does not save the same line twice.
So before using StreamWriter to write the file I want to check each item in the list to see if it exists in the file. If it does, I want to remove it from the list before using StreamWriter.
..... Unless of course there is a better way to go about doing this?
Assuming your files are small and you are limited to flat files plus a database table is not an option, etc., then you could just read existing items into a list then make the write operation condition based on examining the list... Again, I would try for another method if at all possible (db table, etc) but just the most direct answer your question...
string line = "your line to append";
// Read existing lines into list
List<string> existItems = new List<string>();
using (StreamReader sr = new StreamReader(path))
while (!sr.EndOfStream)
existItems.Add(sr.ReadLine());
// Conditional write new line to file
if (existItems.Contains(line))
using (StreamWriter sw = new StreamWriter(path))
sw.WriteLine(line);
I guess what you could do is initialize the list from the file, adding each line as a new entry to the list.
Then, as you add to the list, check to see if it contains the line already.
List<string> l = new List<string>{"A", "B", "C"}; //This would be initialized from the file.
string s;
if(!l.Contains(s))
l.Add(s);
When you are ready to save the file, just write out what is in the list.
This will be slow, especially if you have a lot of data going into the table.
If possible, can you store all the lines in a database table with a primary key on the text column? Then add if the column value does not exist, and when you're done, dump the table to a text file? I think that's what I'd do.
I'd like to point out I don't think this is ideal, but it should be fairly performant (Using mssql syntax):
create table foo (
rowdata varchar(1000) primary key
);
-- for insertion (where #rowdata is new text line):
insert into foo (rowdata)
select #rowdata
where not exists(select 1 from foo where rowdata = #rowdata)
-- for output
select rowdata from foo;
If you can sort the file every time you save it would be much faster to determine if a particular entry exists.
Also a database table would be a good idea as mentioned earlier as you can search for the entry to be added in the table and if it does not exist add it.
It depends on if you are after speed (db), fast implementation (file access) or don't care (use in memory lists until the file gets too big and burns and crashes.)
A similar case can be found here

C# and the CSV file

I formatted this data using c#, streamreader and writer and created this csv file. Here is the sample data:
A------B-C---D----------E------------F------G------H-------I
NEW, C,A123 ,08/24/2011,08/24/2011 ,100.00,100.00,X123456,276135
NEW, C,A125 ,08/24/2011,08/24/2011 ,200.00,100.00,X123456,276135
NEW, C,A127 ,08/24/2011,08/24/2011 , 50.00,100.00,X123456,276135
NEW, T,A122 ,08/24/2011,08/24/2011 , 5.00,100.00,X225511,276136
NEW, T,A124 ,08/24/2011,08/24/2011 , 10.00,100.00,X225511,276136
NEW, T,A133 ,08/24/2011,08/24/2011 ,500.00,100.00,X444556,276137
I would like the following output:
NEW, C,A123 ,08/24/2011,08/24/2011 ,100.00,100.00,X123456,276135
NEW, C,A125 ,08/24/2011,08/24/2011 ,200.00,100.00,X123456,276135
NEW, C,A127 ,08/24/2011,08/24/2011 , 50.00,100.00,X123456,276135
NEW, C,A001 ,08/24/2011,08/24/2011 ,350.00,100.00,X123456,276135
NEW, T,A122 ,08/24/2011,08/24/2011 , 5.00,100.00,X225511,276136
NEW, T,A124 ,08/24/2011,08/24/2011 , 10.00,100.00,X225511,276136
NEW, T,A001 ,08/24/2011,08/24/2011 , 15.00,100.00,X225511,276136
NEW, T,A133 ,08/24/2011,08/24/2011 ,500.00,100.00,X444556,276137
NEW, T,A001 ,08/24/2011,08/24/2011 ,500.00,100.00,X225511,276137
With each change in field "I", I would like to add a line, sum column F, add a "A001" to C, and copy the contents of the other fields into that newly ADDed line.
The letters on the columns are for illustrative purposes only. There are no headers.
First, what should I do first? How do I sum column F, copy contents of all fields, and add "A001" to C? How do I add a line and copy the fields w/ each change in I?
From your questions it doesn't sound like your data fits a flat file format very well, or at least not a CSV (analogy to a single DB table). Wanting to 'copy the contents of the other fields into that newly ADDed line' implies to me that there is possibily a relationship that might be better expressed referentially rather than copying the data to a new row.
Also keep in mind the requirement to sum according to column 'F' suggests that you will need to iterate over every row in order to calculate the sum.
If you decide to go a route other than a CSV your might try a light weight database solution such as SQLite. An alternative might be to look at XmlSerializer or DataContract (with Serializer) and just work with objects in your code. The objects can the be serialized to the disk when you're done with them.
You could use a custom iterator block, just as an example this is showing how to add the new line that contains your "A001" C column. You can just as easily add a new sum column - keep in mind though that you should keep the number of columns in each row the same.
public IEnumerable<string> GetUpdatedLines(string fileName)
{
int? lastValue = null;
foreach (string line in File.ReadLines(fileName))
{
var values = line.Split(',');
int myIValue = Convert.ToInt32(values[7]);
if (lastValue.HasValue && myIValue != lastValue)
{
//emit new line sum
values[2] = "A001";
yield return string.Join(",", values);
}
lastValue = myIValue;
yield return line;
}
}

Comparing Windows.Forms.ListViewItem Items for Uniqueness

I have a Windows.Forms.ListView where the user shall be able to add and remove entries. Particularly, those are files (with attributes) the user can pick through a dialog. Now, I want to check whether the file names / entries I get from the file picker are already in the list; in other words, there shall only be unique items in the ListView.
I could not find any way to compare ListViewItems to check whether the exact same entry and information is already present in my ListView. The only way I see now is to:
> Loop through the files I get from the picker (multiselect is true)
> Loop through ListView.Items
compare ListViewItem.Text
> Loop through ListViewItem.SubItems
compare .Text
If during the comparisons a complete match was found, the new entry is a duplicate and thus is not added afterwards.
This seems like an awful lot of effort to do something that I would find to be a function that is not so uncommon. Is there any other way to achieve this?
The file system itself uses only the filename to test for uniqueness, so you should do the same, no need to compare sub-items too.
Items in a ListView typically represent some object. What I usually do is to assign that object (or at least some value identifying the object) to the Tag property of the corresponding ListViewItem when they are added to the list. That way you get a quite simple setup where you can compare items by getting the values from the Tag property and perform the comparison on those objects instead of the list view representation of them.

Categories