Massive list in c# & wpf - c#

I have a list consisting of all the US Zip codes, each with 3 elements. Thus the list is ~45,000 x 3 strings. What is the best way to load this, essentially the most efficient/optimized? Right now I have a foreach loop running it, and every time it gets to the loading point it hangs. Is there a better approach?
Edit
The usage of this is for the user to be able to type in a zip code and have the city and state displayed in two other text boxes. Right now I have it set to check as the user types, an after the dirt number is entered it freezes up, I believe at the ZipCodes codes = new ZipCodes()
This is the code I'm currently using. I left one of the zipCode.Add statements in, but deleted the other 44,999.
struct ZipCode
{
private String cvZipCode;
private String cvCity;
private String cvState;
public string ZipCodeID { get { return cvZipCode; } }
public string City { get { return cvCity; } }
public string State { get { return cvState; } }
public ZipCode(string zipCode, string city, string state)
{
cvZipCode = zipCode;
cvCity = city;
cvState = state;
}
public override string ToString()
{
return City.ToString() + ", " + State.ToString();
}
}
class ZipCodes
{
private List<ZipCode> zipCodes = new List<ZipCode>();
public ZipCodes()
{
zipCodes.Add(new ZipCode("97475","SPRINGFIELD","OR"));
}
public IEnumerable<ZipCode> GetLocation()
{
return zipCodes;
}
public IEnumerable<ZipCode> GetLocationZipCode(string zipCode)
{
return zipCodes;
}
public IEnumerable<ZipCode> GetLocationCities(string city)
{
return zipCodes;
}
public IEnumerable<ZipCode> GetLocationStates(string state)
{
return zipCodes;
}
}
private void LocateZipCode(TextBox source, TextBox destination, TextBox destination2 = null)
{
ZipCodes zips = new ZipCodes();
string tempZipCode;
List<ZipCode> zipCodes = new List<ZipCode>();
try
{
if (source.Text.Length == 5)
{
tempZipCode = source.Text.Substring(0, 5);
dataWorker.RunWorkerAsync();
destination.Text = zipCodes.Find(searchZipCode => searchZipCode.ZipCodeID == tempZipCode).City.ToString();
if (destination2.Text != null)
{
destination2.Text = zipCodes.Find(searchZipCode => searchZipCode.ZipCodeID == tempZipCode).State.ToString();
}
}
else destination2.Text = "";
}
catch (NullReferenceException)
{
destination.Text = "Invalid Zip Code";
if (destination2 != null)
{
destination2.Text = "";
}
}
}

There are several options that depend on your use case and target client machines.
Use paged controls. Use existing paged control variants (eg. telerik) which support paging. This way you will deal with smaller subset of the data available.
Use search/filter controls. Force users to enter partial data to reduce the size of the data you need to show.
Using observable collection will cause performance problems as framework provided class does not support bulk load. Make your own observable collection which supports bulk loading (which does not raise collection changed event on every element you add). On a list of 5-10.000 members I've seen loading times reduced from 3s to 0.03s.
Use async operations when loading data from db. This way UI stays responsive and you have a chance to inform users about the current operation. This improves the perceived performance immensely.

Instead of loading all of the items, try loading on demand. For instance, when user enters the first three letters then query the list and return only matching items. Many controls exists for this purpose both in silverlight and ajax.

Thanks for all the responses, I really do appreciate them. A couple I didn't really understand, but I know that's my own lack of knowledge in certain areas of c#. In researching them though, I did stumble across a different solution that had worked beautifully, using a Dictionary<T> instead of a List. Even without using a BackgroundWorker, it loads on app start-up in about 5 seconds. I had heard of Dictionary<T> before, but until now had never had a cause to use/research it, so this was doubly beneficial to me. Thanks again for all the assistance!

Related

How To Display All Desired Elements From A List<Object> To The Console (C#)

So I'm making a menu the user can use to view movies, add movies, and delete movies. Right now I have the menu completed to the point where if the user input is the number 1 it should display the movie title, year, director, and summary of that movie in another screen. I have a foreach loop that I'm using the display that particular movie's title, year, director, and summary but when I run my program it only shows the title and year and that's it. How can I show all 4 of those at once in the console? Code is below.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace MovieLibrary
{
// Should contain data: an address and list of movie objects.
// MUST include a dynamic 'MENU' of MOVIES
public class Library
{
// Fields
private string _directory = "../../output/";
private string _file = "Movies.txt";
private List<Movie> _movies;
public Library()
{
_movies = new List<Movie>();
Load();
bool menuRunning = true;
while (menuRunning)
{
Console.WriteLine("Pick a number. Any number.");
string userOption = Console.ReadLine();
while (string.IsNullOrWhiteSpace(userOption))
{
Console.WriteLine("Please do not leave this blank.");
Console.WriteLine("Pick a number. Any number.");
userOption = Console.ReadLine();
}
if (userOption == "1")
{
string montyTitle;
int montyYear;
string montyDirector;
string montySummary;
foreach (Movie movie in _movies)
{
if (movie.Title == "Monty Python and the Holy Grail")
{
montyTitle = movie.Title;
Console.WriteLine("Title: {0}", montyTitle);
}
if (movie.Year == 1975)
{
montyYear = movie.Year;
Console.WriteLine("Year: {0}", montyYear);
}
if (movie.Director == "Terry Gilliam & Terry Jones")
{
montyDirector = movie.Director;
Console.WriteLine("Director: {0}", montyDirector);
}
if (movie.Summary == "Monty Python and the Holy Grail is about a ragtag group, the knights of the round table, assembled by the Great King Arthur to embark on a quest given by God to find the Holy Grail.")
{
montySummary = movie.Summary;
Console.WriteLine("Summary: {0}\r\n", montySummary);
}
}
}
else if (userOption == "2")
{
// RE
}
else if (userOption == "3")
{
// Alien
}
else if (userOption == "4")
{
// DP
}
else if (userOption == "5")
{
// The Avengers
}
else if (userOption == "6")
{
// Zombieland
}
else if (userOption == "7")
{
// BTLC
}
else if (userOption == "8")
{
// The Thing
}
else if (userOption == "9")
{
// CITW
}
}
}
//Loads the text file
private void Load()
{
using (StreamReader sr = new StreamReader(_directory + _file))
{
string text;
while ((text = sr.ReadLine()) != null)
{
string[] contents = text.Split(':');
Movie newLibrary = new Movie(contents[0], int.Parse(contents[1]), contents[2], contents[3]);
_movies.Add(newLibrary);
}
}
}
// Allows the user to view the list of movies
private void View()
{
Console.Clear();
foreach (Movie movie in _movies)
{
Console.WriteLine($"{movie.Title,-10}{"\n" + movie.Year,-10}{"\n" + movie.Director,-10}{"\n" + movie.Summary + "\r\n",-10}");
}
}
}
// Allows the user to add any new movies they would like to the list of movies in the text file
/*public static void Add()
{
}
// Allows the user to remove any movies they would like from the text file
public static void Remove()
{
}*/
}
I do not really understand your question.
It might be that the if (movie.Director == "Terry Gilliam & Terry Jones") is not going inside the if.
Maybe I can propose something different(?)
You do not need the foreach loop to iterate the list.
You can get items from your _movies in 2 different ways:
Convert to array: var _myMoviesArray = _movies.ToArray(); and access them by an index (Console.WriteLine(_myMoviesArray[userOption].Title));
Or use Console.WriteLine(_movies.IndexAt(userOption).Title).
For printing in the console, instead of:
Console.WriteLine("Title: {0}", montyTitle);
you can do:
Console.WriteLine($"Title: {movie.Title}");
And I think that you do not need to check the if(movie.Title == "Monty Python and the Holy Grail"). If you access by the userOption index, you can get directly to the Movie in your array.
I'm not sure which part is bothering you exactly, but I was bored and decided to do a simple PoC of segregated code for you. I've defined a simple repository to manipulate your entity data and created a specific implementation of it targeting movies. I've implemented in-memory storage of your movies which could have any underlying implementation, such as text file or you could keep it in memory and store in the library itself once you finish modifying in-memory state. You can expand it, modify it, or discard it entirely if you don't like it.
Repository interface definition
namespace MovieLibrary
{
using System;
using System.Collections.Generic;
internal interface IRepository<T, K>
{
IEnumerable<T> Get();
IEnumerable<T> Get(Func<T, bool> condition);
T Get(K guid);
void Add(T entity);
void Remove(T guid);
}
}
Movie repository implementation
namespace MovieLibrary
{
using System;
using System.Collections.Generic;
using System.Linq;
internal class MovieRepository
: IRepository<Movie, Guid>
{
// In-memory storage, can be replaced with anything else.
private readonly ISet<Movie> movies;
public MovieRepository(IEnumerable<Movie> initialState = null)
{
this.movies = initialState?.ToHashSet() ?? new HashSet<Movie>();
}
public void Add(Movie movie) => movies.Add(movie);
public IEnumerable<Movie> Get() => movies;
public IEnumerable<Movie> Get(Func<Movie, bool> condition) => movies.Where(condition);
public Movie Get(Guid guid) => movies.SingleOrDefault(movie => movie.Id == guid);
public void Remove(Movie movie) => movies.Remove(movie);
}
}
Movie definition
namespace MovieLibrary
{
using System;
public class Movie
{
public Movie(string title, int year, string director, string summary)
{
this.Id = Guid.NewGuid();
this.Title = title;
this.Year = year;
this.Director = director;
this.Summary = summary;
}
public Guid Id { get; }
public string Title { get; }
public int Year { get; }
public string Director { get; }
public string Summary { get; }
public override bool Equals(object other) => other is Movie movie && movie.Id == this.Id;
public override int GetHashCode() => HashCode.Combine(Id);
public override string ToString() => string.Format(
"-- Details --{0}{0}Title: {1}{0}Year: {2}{0}Director: {3}{0}Summary: {4}",
Environment.NewLine,
this.Title,
this.Year,
this.Director,
this.Summary);
}
}
Library itself
namespace MovieLibrary
{
using System;
using System.Collections.Generic;
using System.Linq;
public class Library
{
private readonly IRepository<Movie, Guid> repository;
private readonly IDictionary<int, Movie> menuOptions;
public Library()
{
this.repository = this.InitializeRepository();
this.menuOptions = this.repository.Get()
.Select((movie, index) =>
new { Ordinal = index + 1, Movie = movie })
.ToDictionary(
item => item.Ordinal,
item => item.Movie);
}
public void Run()
{
Console.WriteLine("Welcome to the movie library! Pick a movie you'd like to watch...");
foreach (KeyValuePair<int, Movie> option in menuOptions)
{
Console.WriteLine($"{option.Key}: {option.Value.Title}");
}
int selectedOrdinal = 0;
Movie selectedMovie = null;
do
{
Console.Write("Select option: ");
string input = Console.ReadLine();
if (!int.TryParse(input, out selectedOrdinal))
{
Console.WriteLine("Please pick the existing option from the list...");
continue;
}
this.menuOptions.TryGetValue(selectedOrdinal, out selectedMovie);
}
while (selectedMovie == null);
Console.WriteLine("Thank you for picking our movie library! Your movie will start in a second...");
Console.WriteLine(selectedMovie);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
private IRepository<Movie, Guid> InitializeRepository()
{
// Mocking the initial data of library. You could load it from file instead.
IEnumerable<Movie> movies = Enumerable
.Range(1, 10)
.Select(ordinal => new Movie(
$"Title {ordinal}",
DateTime.UtcNow.Year + ordinal,
$"Some Director {ordinal}",
$"Some dummy summary text {ordinal}"));
return new MovieRepository(movies);
}
}
}
Can I get you to do one thing?
Turn the computer off (maybe after printing the problem statement) and sit down with a pencil and paper
Think about the problem in English (or your native language)
You have a file of movie info, one movie per line
You need to read the file, take input from the user that is a numerical index of the movie in the file lines, and show info about that movie in the console
Reason I'm asking you to turn the computer off is when sitting with visual studio open and a problem statement there is a great temptation to start writing code before thinking about what needs to be written. It's like being asked to build a bridge and you immediately march off to the builders yard and place an order for 10000 bricks - the first step to building a bridge is to find out all the requirements, draw it, make some calculations, think about what it has to support now and in the future. Building a bridge is a big task, this programming one you have is a relatively small task but it's big to you because you're just starting out. Once you learn the value of these pre-programming scoping exercises I promise you you'll use them throughout your entire career.
The problems will get bigger but the approach will remain the same; putting your algorithm on paper will help immensely, and should be viewed like writing an essay plan, or forming a sentence in your mind in English because you're an English native, rearranging it to how the Spanish person would say it, then translating it to Spanish words and speaking it. Everything you do when engineering software will be an exercise in translation, from headline overview, to detailed overview, to fine grained processes, eventually to computer code
You may always think in English and have to translate your English thoughts to c# actions; you need to write the process down in English first. Eventually you'll think in C# for some things, but it'll always mix with English, especially when dealing with normal users
Instead of pen and paper, you could also do this in c# comments then translate underneath them (and end up with nicely commented code, bonus) but as a beginner you'll get a lot more out of the visualization exercise of pen and paper. You can easily create flow diagrams, side notes, box outs, and rub things out - treeware approach will always help
So your algorithm might look like this
read all the lines out of the file
work through them one by one
turn the line into a movie object -- callout note to self, need a movie object, what does it look like?
add it to a collection of Movie objects
that's the reading part done -- should store that collection of movies somewhere. Should make that reading part a self contained thing
I now know how many movies I read out of the file because I have a collection with a count
ask the user for an index
print the movie at that index. No, wait. Stuff in c# is zero based so if the user wants movie 1, that will be in index 0 of the list/array/whatever -- remember to do a -1!!
ask the user for another index - this is repetitive, so I probably need a loop. Another note to self, need a way to get out of the loop
Then you might add
printing a movie needs breaking down
should have a separate bit of code that prints a movie
oh, there was that lecture about ToString and how it can be used to make a string representation of a custom object. Note to self- make a ToString in movie, then can just print the movie object and it will be ToStringed and nicely formatted automatically
The next thing I'll ask you to do, and it's a big ask, is to temporarily (or perhaps permanently) set aside all the code you wrote already. Some bits are usable, some are a mess of jumbled thoughts with no clear algorithm
Your movie class is probably ok
Your View method contains bits that would be useful for ToString
The method that reads the file is OK too
Start simple, with a static void main that first reads the movies (calls a ReadLibrary method, that returns a List<> of Movies) and enters a loop that shows the user a menu
Ask for input, just the movie index to be printed - none of that saving, adding new movies etc. Gotta keep it simple to start
Print the movie out, loop round again
With what you have already in the reading, the movie class, turning that view into an override ToString, you should be able to complete the current task in about 10 or fewer lines of code. If you go significantly over this (and you have currently) then your thinking has gone wrong
For example, you're asking for input and then you're saying "if they entered 1 then else if they entered 2 then..."
Take their input, turn it into a number and show the movie at that number (less one) in the collection. This makes it truly dynamic. Consider what will happen if they enter 999999 for a laugh and your movie collection only has 10 movies. Put a check in to stop it breaking. You might go back to your paper and where it says "ask the user for an index" you might add "and ensure that it isn't a crazy value"
Have an AskString method that takes a string question, prints the question and asks for input and returns it
Have an AskInt method that uses the AskString method and parses its return value to an int. extend the method to take another two parameters as well as the question- lower bound and upper bound. If the user enters a number outside the bounds, repeat the AskString until they enter a sensible value
I cannot, in good conscience, do your homework for you but I present this answer as educating you as to how to solve your own problem. All the bits you need are already present in what you've written somewhere but they lack structure and forethought
I've no qualms repeatedly editing this answer to address further queries you have- drop a comment as to what is needed
Based on my test, if your other movies have duplicate titles, years, etc., " if (movie.Title == "Monty Python and the Holy Grail") " cannot find a movie accurately.
Your loop " while (menuRunning){} " code can change to the following code:
while (menuRunning)
{
Console.WriteLine("Pick a number. Any number.");
try
{
int userOption = int.Parse(Console.ReadLine());
Console.Clear();
var _myMoviesArray = _movies.ToArray();
Console.WriteLine("Title:{0}\nYear:{1}\nDirector:{2}\nSummary:{3}\n", _myMoviesArray[userOption].Title, _myMoviesArray[userOption].Year, _myMoviesArray[userOption].Director, _myMoviesArray[userOption].Summary);
}
catch (Exception e) { Console.WriteLine("please enter the number in the correct format."); }
}
My test document
Code running result

What is proper way to save data from file to object C#

what is proper way to save all lines from text file to objects. I have .txt file something like this
0001Marcus Aurelius 20021122160 21311
0002William Shakespeare 19940822332 11092
0003Albert Camus 20010715180 01232
From this file I know position of each data that is written in file, and all data are formatted.
Line number is from 0 to 3
Book author is from 4 to 30
Publish date is from 31 to 37
Page num. is from 38 to 43
Book code is from 44 to 49
I made class Data which holds information about start, end position, value, error.
Then I made class Line that holds list of type Data, and list that holds all error founded from some line. After load data from line to object Data I loop through lineError and add errors from all line to list, because I need to save errors from each line to database.
My question is this proper way to save data from file to object and after processing same data saving to database, advice for some better approach?
public class Data
{
public int startPosition = 0;
public int endPosition = 0;
public object value = null;
public string fieldName = "";
public Error error = null;
public Data(int start, int end, string name)
{
this.startPosition = start;
this.endPosition = end;
this.fieldName = name;
}
public void SetValueFromLine(string line)
{
string valueFromLine = line.Substring(this.startPosition, this.endPosition - this.startPosition);
// if else statment that checks validity of data (lenght, empty value)
this.value = valueFromLine;
}
}
public class Line
{
public List<Data> lineData = new List<Data>();
public List<Error> lineError = new List<Error>();
public Line()
{
AddObjectDataToList();
}
public void AddObjectDataToList()
{
lineData.Add(new Data(0, 3, "lineNumber"));
lineData.Add(new Data(4, 30, "bookAuthor"));
lineData.Add(new Data(31, 37, "publishData"));
lineData.Add(new Data(38, 43, "pageNumber"));
lineData.Add(new Data(44, 49, "bookCode"));
}
public void LoadLineDataToObjects(string line)
{
foreach(Data s in lineData)
{
s.SetValueFromLine(line);
}
}
public void GetAllErrorFromData()
{
foreach (Data s in lineData)
{
if(s.error != null)
{
lineError.Add(s.error);
}
}
}
}
public class File
{
public string fileName;
public List<Line> lines = new List<Line>();
}
I assume that the focus is on using OOP. I also assume that parsing is a secondary task and I will not consider options for its implementation.
First of all, it is necessary to determine the main acting object. Strange as it may seem, this is not a Book, but the string itself (e.g. DataLine). Initially, I wanted to create a Book from a string (through a separate constructor), but that would be a mistake.
What actions should be able to perform DataLine? - In fact, only one - process. I see two acceptable options for this method:
process returns Book or throws exceptions. (Book process())
process returns nothing, but interacts with another object. (void process(IResults result))
The first option has the following drawbacks:
It is difficult to test (although this applies to the second option). All validation is hidden inside DataLine.
It is impossible/difficult to return a few errors.
The program is aimed at working with incorrect data, so expected exceptions are often generated. This violates the ideology of exceptions. Also, there are small fears of slowing performance.
The second option is devoid of the last two drawbacks. IResults can contain methodserror(...), to return several errors, and success(Book book).
The testability of the process method can be significantly improved by adding IValidator. This object can be passed as a parameter to the DataLine constructor, but this is not entirely correct. First, this unnecessary expense of memory because it will not give us tangible benefits. Secondly, this does not correspond to the essence of the DataLine class. DataLine represents only a line that can be processed in one particular way. Thus, a good solution is the void process (IValidator validator, IResults result).
Summarize the above (may contain syntax errors):
interface IResults {
void error (string message);
void success (Book book);
}
interface IValidator {
// just example
bool checkBookCode (string bookCode);
}
class DataLine {
private readonly string _rawData;
// constructor
/////////////////
public void process (IValidator validator, IResults result) {
// parse _rawData
bool isValid = true; // just example! maybe better to add IResults.hasErrors ()
if (! validator.checkBookCode (bookCode)) {
result.error("Bad book code");
isValid = false;
}
if (isValid) {
result.success(new Book (...));
// or even result.success (...); to avoid cohesion (coupling?) with the Book
}
}
}
The next step is to create a model of the file with the lines. Here again there are many options and nuances, but I would like to pay attention to IEnumerable<DataLine>. Ideally, we need to create a DataLines class that will support IEnumerable<DataLine> and load from a file or from IEnumerable<string>. However, this approach is relatively complex and redundant, it makes sense only in large projects. A much simpler version:
interface DataLinesProvider {
IEnumerable <DataLine> Lines ();
}
class DataLinesFile implements DataLinesProvider {
private readonly string _fileName;
// constructor
////////////////////
IEnumerable <DataLine> Lines () {
// not sure that it's right
return File
. ReadAllLines (_fileName)
.Select (x => new DataLine (x));
}
}
You can infinitely improve the code, introduce new and new abstractions, but here you must start from common sense and a specific problem.
P. S. sorry for "strange" English. Google not always correctly translate such complex topics.

Design pattern for dynamic C# object

I have a queue that processes objects in a while loop. They are added asynchronously somewhere.. like this:
myqueue.pushback(String value);
And they are processed like this:
while(true)
{
String path = queue.pop();
if(process(path))
{
Console.WriteLine("Good!");
}
else
{
queue.pushback(path);
}
}
Now, the thing is that I'd like to modify this to support a TTL-like (time to live) flag, so the file path would be added o more than n times.
How could I do this, while keeping the bool process(String path) function signature? I don't want to modify that.
I thought about holding a map, or a list that counts how many times the process function returned false for a path and drop the path from the list at the n-th return of false. I wonder how can this be done more dynamically, and preferably I'd like the TTL to automatically decrement itself at each new addition to the process. I hope I am not talking trash.
Maybe using something like this
class JobData
{
public string path;
public short ttl;
public static implicit operator String(JobData jobData) {jobData.ttl--; return jobData.path;}
}
I like the idea of a JobData class, but there's already an answer demonstrating that, and the fact that you're working with file paths give you another possible advantage. Certain characters are not valid in file paths, and so you could choose one to use as a delimiter. The advantage here is that the queue type remains a string, and so you would not have to modify any of your existing asynchronous code. You can see a list of reserved path characters here:
http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
For our purposes, I'll use the percent (%) character. Then you can modify your code as follows, and nothing else needs to change:
const int startingTTL = 100;
const string delimiter = "%";
while(true)
{
String[] path = queue.pop().Split(delimiter.ToCharArray());
int ttl = path.Length > 1?--int.Parse(path[1]):startingTTL;
if(process(path[0]))
{
Console.WriteLine("Good!");
}
else if (ttl > 0)
{
queue.pushback(string.Format("{0}{1}{2}", path[0], delimiter,ttl));
}
else
{
Console.WriteLine("TTL expired for path: {0}" path[0]);
}
}
Again, from a pure architecture standpoint, a class with two properties is a better design... but from a practical standpoint, YAGNI: this option means you can avoid going back and changing other asynchronous code that pushes into the queue. That code still only needs to know about the strings, and will work with this unmodified.
One more thing. I want to point out that this is a fairly tight loop, prone to running away with a cpu core. Additionally, if this is the .Net queue type and your tight loop gets ahead of your asynchronous produces to empty the queue, you'll throw an exception, which would break out of the while(true) block. You can solve both issues with code like this:
while(true)
{
try
{
String[] path = queue.pop().Split(delimiter.ToCharArray());
int ttl = path.Length > 1?--int.Parse(path[1]):startingTTL;
if(process(path[0]))
{
Console.WriteLine("Good!");
}
else if (ttl > 0)
{
queue.pushback(string.Format("{0}{1}{2}", path[0], delimiter,ttl));
}
else
{
Console.WriteLine("TTL expired for path: {0}" path[0]);
}
}
catch(InvalidOperationException ex)
{
//Queue.Dequeue throws InvalidOperation if the queue is empty... sleep for a bit before trying again
Thread.Sleep(100);
}
}
If the constraint is that bool process(String path) cannot be touched/changed then put the functionality into myqueue. You can keep its public signatures of void pushback(string path) and string pop(), but internally you can track your TTL. You can either wrap the string paths in a JobData-like class that gets added to the internal queue, or you can have a secondary Dictionary keyed by path. Perhaps even something as simple as saving the last poped path and if the subsequent push is the same path you can assume it was a rejected/failed item. Also, in your pop method you can even discard a path that has been rejected too many time and internally fetch the next path so the calling code is blissfully unaware of the issue.
You could abstract/encapsulate the functionality of the "job manager". Hide the queue and implementation from the caller so you can do whatever you want without the callers caring. Something like this:
public static class JobManager
{
private static Queue<JobData> _queue;
static JobManager() { Task.Factory.StartNew(() => { StartProcessing(); }); }
public static void AddJob(string value)
{
//TODO: validate
_queue.Enqueue(new JobData(value));
}
private static StartProcessing()
{
while (true)
{
if (_queue.Count > 0)
{
JobData data = _queue.Dequeue();
if (!process(data.Path))
{
data.TTL--;
if (data.TTL > 0)
_queue.Enqueue(data);
}
}
else
{
Thread.Sleep(1000);
}
}
}
private class JobData
{
public string Path { get; set; }
public short TTL { get; set; }
public JobData(string value)
{
this.Path = value;
this.TTL = DEFAULT_TTL;
}
}
}
Then your processing loop can handle the TTL value.
Edit - Added a simple processing loop. This code isn't thread safe, but should hopefully give you an idea.

C# OutOfMemory, Mapped Memory File or Temp Database

Seeking some advice, best practice etc...
Technology: C# .NET4.0, Winforms, 32 bit
I am seeking some advice on how I can best tackle large data processing in my C# Winforms application which experiences high memory usage (working set) and the occasional OutOfMemory exception.
The problem is that we perform a large amount of data processing "in-memory" when a "shopping-basket" is opened. In simplistic terms when a "shopping-basket" is loaded we perform the following calculations;
For each item in the "shopping-basket" retrieve it's historical price going all the way back to the date the item first appeared in-stock (could be two months, two years or two decades of data). Historical price data is retrieved from text files, over the internet, any format which is supported by a price plugin.
For each item, for each day since it first appeared in-stock calculate various metrics which builds a historical profile for each item in the shopping-basket.
The result is that we can potentially perform hundreds, thousand and/or millions of calculations depending upon the number of items in the "shopping-basket". If the basket contains too many items we run the risk of hitting a "OutOfMemory" exception.
A couple of caveats;
This data needs to be calculated for each item in the "shopping-basket" and the data is kept until the "shopping-basket" is closed.
Even though we perform steps 1 and 2 in a background thread, speed is important as the number of items in the "shopping-basket" can greatly effect overall calculation speed.
Memory is salvaged by the .NET garbage collector when a "shopping-basket" is closed. We have profiled our application and ensure that all references are correctly disposed and closed when a basket is closed.
After all the calculations are completed the resultant data is stored in a IDictionary. "CalculatedData is a class object whose properties are individual metrics calculated by the above process.
Some ideas I've thought about;
Obviously my main concern is to reduce the amount of memory being used by the calculations however the volume of memory used can only be reduced if I
1) reduce the number of metrics being calculated for each day or
2) reduce the number of days used for the calculation.
Both of these options are not viable if we wish to fulfill our business requirements.
Memory Mapped Files
One idea has been to use memory mapped files which will store the data dictionary. Would this be possible/feasible and how can we put this into place?
Use a temporary database
The idea is to use a separate (not in-memory) database which can be created for the life-cycle of the application. As "shopping-baskets" are opened we can persist the calculated data to the database for repeated use, alleviating the requirement to recalculate for the same "shopping-basket".
Are there any other alternatives that we should consider? What is best practice when it comes to calculations on large data and performing them outside of RAM?
Any advice is appreciated....
The easiest solution is a database, perhaps SQLite. Memory mapped files don't automatically become dictionaries, you would have to code all the memory management yourself, and thereby fight with the .net GC system itself for ownership of he data.
If you're interested in trying the memory mapped file approach, you can try it now. I wrote a small native .NET package called MemMapCache that in essence creates a key/val database backed by MemMappedFiles. It's a bit of a hacky concept, but the program MemMapCache.exe keeps all references to the memory mapped files so that if your application crashes, you don't have to worry about losing the state of your cache.
It's very simple to use and you should be able to drop it in your code without too many modifications. Here is an example using it: https://github.com/jprichardson/MemMapCache/blob/master/TestMemMapCache/MemMapCacheTest.cs
Maybe it'd be of some use to you to at least further figure out what you need to do for an actual solution.
Please let me know if you do end up using it. I'd be interested in your results.
However, long-term, I'd recommend Redis.
As an update for those stumbling upon this thread...
We ended up using SQLite as our caching solution. The SQLite database we employ exists separate to the main data store used by the application. We persist calculated data to the SQLite (diskCache) as it's required and have code controlling cache invalidation etc. This was a suitable solution for us as we were able to achieve write speeds up and around 100,000 records per second.
For those interested, this is the code that controls inserts into the diskCache. Full credit for this code goes to JP Richardson (shown answering a question here) for his excellent blog post.
internal class SQLiteBulkInsert
{
#region Class Declarations
private SQLiteCommand m_cmd;
private SQLiteTransaction m_trans;
private readonly SQLiteConnection m_dbCon;
private readonly Dictionary<string, SQLiteParameter> m_parameters = new Dictionary<string, SQLiteParameter>();
private uint m_counter;
private readonly string m_beginInsertText;
#endregion
#region Constructor
public SQLiteBulkInsert(SQLiteConnection dbConnection, string tableName)
{
m_dbCon = dbConnection;
m_tableName = tableName;
var query = new StringBuilder(255);
query.Append("INSERT INTO ["); query.Append(tableName); query.Append("] (");
m_beginInsertText = query.ToString();
}
#endregion
#region Allow Bulk Insert
private bool m_allowBulkInsert = true;
public bool AllowBulkInsert { get { return m_allowBulkInsert; } set { m_allowBulkInsert = value; } }
#endregion
#region CommandText
public string CommandText
{
get
{
if(m_parameters.Count < 1) throw new SQLiteException("You must add at least one parameter.");
var sb = new StringBuilder(255);
sb.Append(m_beginInsertText);
foreach(var param in m_parameters.Keys)
{
sb.Append('[');
sb.Append(param);
sb.Append(']');
sb.Append(", ");
}
sb.Remove(sb.Length - 2, 2);
sb.Append(") VALUES (");
foreach(var param in m_parameters.Keys)
{
sb.Append(m_paramDelim);
sb.Append(param);
sb.Append(", ");
}
sb.Remove(sb.Length - 2, 2);
sb.Append(")");
return sb.ToString();
}
}
#endregion
#region Commit Max
private uint m_commitMax = 25000;
public uint CommitMax { get { return m_commitMax; } set { m_commitMax = value; } }
#endregion
#region Table Name
private readonly string m_tableName;
public string TableName { get { return m_tableName; } }
#endregion
#region Parameter Delimiter
private const string m_paramDelim = ":";
public string ParamDelimiter { get { return m_paramDelim; } }
#endregion
#region AddParameter
public void AddParameter(string name, DbType dbType)
{
var param = new SQLiteParameter(m_paramDelim + name, dbType);
m_parameters.Add(name, param);
}
#endregion
#region Flush
public void Flush()
{
try
{
if (m_trans != null) m_trans.Commit();
}
catch (Exception ex)
{
throw new Exception("Could not commit transaction. See InnerException for more details", ex);
}
finally
{
if (m_trans != null) m_trans.Dispose();
m_trans = null;
m_counter = 0;
}
}
#endregion
#region Insert
public void Insert(object[] paramValues)
{
if (paramValues.Length != m_parameters.Count)
throw new Exception("The values array count must be equal to the count of the number of parameters.");
m_counter++;
if (m_counter == 1)
{
if (m_allowBulkInsert) m_trans = m_dbCon.BeginTransaction();
m_cmd = m_dbCon.CreateCommand();
foreach (var par in m_parameters.Values)
m_cmd.Parameters.Add(par);
m_cmd.CommandText = CommandText;
}
var i = 0;
foreach (var par in m_parameters.Values)
{
par.Value = paramValues[i];
i++;
}
m_cmd.ExecuteNonQuery();
if(m_counter != m_commitMax)
{
// Do nothing
}
else
{
try
{
if(m_trans != null) m_trans.Commit();
}
catch(Exception)
{ }
finally
{
if(m_trans != null)
{
m_trans.Dispose();
m_trans = null;
}
m_counter = 0;
}
}
}
#endregion
}

Unique EventId generation

I'm using the Windows Event Log to record some events. Events within the Windows Event Log can be assigned a handful of properties. One of which, is an EventID.
Now I want to use the EventId to try and group related errors. I could just pick a number for each call to the logging method I do, but that seems a little tedious.
I want the system to do this automatically. It would choose an eventId that is "unique" to the position in the code where the logging event occurred. Now, there's only 65536 unique event IDs, so there are likely to be collisions but they should be rare enough to make the EventId a useful way to group errors.
One strategy would be to take the hashcode of the stacktrace but that would mean that the first and second calls in the following code would have generate the same event ID.
public void TestLog()
{
LogSomething("Moo");
// Do some stuff and then a 100 lines later..
LogSomething("Moo");
}
I thought of walking up the call stack using the StackFrame class which has a GetFileLineNumber method. The problem with this strategy is that it will only work when built with debug symbols on. I need it to work in production code too.
Does anyone have any ideas?
Here is some code you can use to generate an EventID with the properties I describe in my question:
public static int GenerateEventId()
{
StackTrace trace = new StackTrace();
StringBuilder builder = new StringBuilder();
builder.Append(Environment.StackTrace);
foreach (StackFrame frame in trace.GetFrames())
{
builder.Append(frame.GetILOffset());
builder.Append(",");
}
return builder.ToString().GetHashCode() & 0xFFFF;
}
The frame.GetILOffset() method call gives the position within that particular frame at the time of execution.
I concatenate these offsets with the entire stacktrace to give a unique string for the current position within the program.
Finally, since there are only 65536 unique event IDs I logical AND the hashcode against 0xFFFF to extract least significant 16-bits. This value then becomes the EventId.
The IL offset number is available without debug symbols. Combined with the stack information and hashed, I think that would do the trick.
Here's an article that, in part, covers retrieving the IL offset (for the purpose of logging it for an offline match to PDB files--different problem but I think it'll show you what you need):
http://timstall.dotnetdevelopersjournal.com/getting_file_and_line_numbers_without_deploying_the_pdb_file.htm
Create a hash using the ILOffset of the last but one stack frame instead of the line number (i.e. the stack frame of your TestLog method above).
*Important: This post focuses at solving the root cause of what it appears your problem is instead of providing a solution you specifically asked for. I realize this post is old, but felt it important to contribute. *
My team had a similar issue, and we changed the way we managed our logging which has reduced production support and bug patching times significantly. Pragmatically this works in most enterprise apps my team works on:
Prefix log messages with the "class name"."function name".
For true errors, output the captured Exception to the event logger.
Focus on having clear messages as part of the peer code review as opposed to event id's.
Use a unique event id for each function, just go top to bottom and key them.
when it becomes impractical to code each function a different event ID, each class should just just have a unique one (collisions be damned).
Utilize Event categories to reduce event id reliance when filtering the log
Of course it matters how big your apps are and how sensitive the data is. Most of ours are around 10k to 500k lines of code with minimally sensitive information. It may feel oversimplified, but from a KISS standpoint it pragmatically works.
That being said, using an abstract Event Log class to simplify the process makes it easy to utilize, although cleanup my be unpleasant. For Example:
MyClass.cs (using the wrapper)
class MyClass
{
// hardcoded, but should be from configuration vars
private string AppName = "MyApp";
private string AppVersion = "1.0.0.0";
private string ClassName = "MyClass";
private string LogName = "MyApp Log";
EventLogAdapter oEventLogAdapter;
EventLogEntryType oEventLogEntryType;
public MyClass(){
this.oEventLogAdapter = new EventLogAdapter(
this.AppName
, this.LogName
, this.AppName
, this.AppVersion
, this.ClassName
);
}
private bool MyFunction() {
bool result = false;
this.oEventLogAdapter.SetMethodInformation("MyFunction", 100);
try {
// do stuff
this.oEventLogAdapter.WriteEntry("Something important found out...", EventLogEntryType.Information);
} catch (Exception oException) {
this.oEventLogAdapter.WriteEntry("Error: " + oException.ToString(), EventLogEntryType.Error);
}
return result;
}
}
EventLogAdapter.cs
class EventLogAdapter
{
//vars
private string _EventProgram = "";
private string _EventSource = "";
private string _ProgramName = "";
private string _ProgramVersion = "";
private string _EventClass = "";
private string _EventMethod = "";
private int _EventCode = 1;
private bool _Initialized = false;
private System.Diagnostics.EventLog oEventLog = new EventLog();
// methods
public EventLogAdapter() { }
public EventLogAdapter(
string EventProgram
, string EventSource
, string ProgramName
, string ProgramVersion
, string EventClass
) {
this.SetEventProgram(EventProgram);
this.SetEventSource(EventSource);
this.SetProgramName(ProgramName);
this.SetProgramVersion(ProgramVersion);
this.SetEventClass(EventClass);
this.InitializeEventLog();
}
public void InitializeEventLog() {
try {
if(
!String.IsNullOrEmpty(this._EventSource)
&& !String.IsNullOrEmpty(this._EventProgram)
){
if (!System.Diagnostics.EventLog.SourceExists(this._EventSource)) {
System.Diagnostics.EventLog.CreateEventSource(
this._EventSource
, this._EventProgram
);
}
this.oEventLog.Source = this._EventSource;
this.oEventLog.Log = this._EventProgram;
this._Initialized = true;
}
} catch { }
}
public void WriteEntry(string Message, System.Diagnostics.EventLogEntryType EventEntryType) {
try {
string _message =
"[" + this._ProgramName + " " + this._ProgramVersion + "]"
+ "." + this._EventClass + "." + this._EventMethod + "():\n"
+ Message;
this.oEventLog.WriteEntry(
Message
, EventEntryType
, this._EventCode
);
} catch { }
}
public void SetMethodInformation(
string EventMethod
,int EventCode
) {
this.SetEventMethod(EventMethod);
this.SetEventCode(EventCode);
}
public string GetEventProgram() { return this._EventProgram; }
public string GetEventSource() { return this._EventSource; }
public string GetProgramName() { return this._ProgramName; }
public string GetProgramVersion() { return this._ProgramVersion; }
public string GetEventClass() { return this._EventClass; }
public string GetEventMethod() { return this._EventMethod; }
public int GetEventCode() { return this._EventCode; }
public void SetEventProgram(string EventProgram) { this._EventProgram = EventProgram; }
public void SetEventSource(string EventSource) { this._EventSource = EventSource; }
public void SetProgramName(string ProgramName) { this._ProgramName = ProgramName; }
public void SetProgramVersion(string ProgramVersion) { this._ProgramVersion = ProgramVersion; }
public void SetEventClass(string EventClass) { this._EventClass = EventClass; }
public void SetEventMethod(string EventMethod) { this._EventMethod = EventMethod; }
public void SetEventCode(int EventCode) { this._EventCode = EventCode; }
}
Thanks for the idea of hashing the call stack, I was going to ask that very same question of how to pick an eventId.
I recommend putting a static variable in LogSomething that increments each time it is called.
Now I want to use the EventId to try
and group related errors.
You have filters in event viewer so why (Go to find ? You have 65536 unique event IDs too.
Or rather use log4net or something ??
just my ideas....

Categories