This is the problem I am trying to solve :
Given as list of movies in a csv file, extract the movie name and genre and be able to return a query based on the years and the genre. Basically, Parsing a CSV input file and build an indexed data storage to allow search the data in time manner.
My Attempt:
Create a Dictionary with Key as genre and value as List of movies in that genre. This dictionary will satisfy when searched by genre.
To get results by year, I was thinking of creating another dictionary with key as year and value as list of movies in that year. This dictionary will satisfy when searched by year.
Now, when we have really large data to be read from csv, it is wise to create multiple dictionaries for each look up criteria like what I have done? Or should I just create a single List for the csv data and based on the criteria, filter it out. This will slow down the performance. Are there any better approaches to this problem?
Also in my code before displaying the values I am sorting by MovieName. Should I sort the list and save in the dictionary itself?
Any feedback related to the code is also appreciated.
CSV file source : https://gist.github.com/tiangechen/b68782efa49a16edaf07dc2cdaa855ea
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Enter Genre");
string genre = Console.ReadLine();
CSVReader csv = new CSVReader(#"C:\Users\Downloads\movies.csv");
var result = csv.ReadCSVFile();
if (result != null)
{
List<MovieDetails> movieDetails;
result.TryGetValue(genre.ToUpper(), out movieDetails);
if (movieDetails != null)
{
movieDetails.Sort((x, y)=> x.MovieName.CompareTo(y.MovieName));
foreach (var item in movieDetails)
{
Console.WriteLine(item.MovieName +" "+item.Genre);
}
}
}
}
}
public class CSVReader
{
public string filePath { get; set; }
public CSVReader(string filepath)
{
this.filePath = filepath;
}
public Dictionary<string, List<MovieDetails>> ReadCSVFile()
{
try
{
string Line;
Dictionary<string, List<MovieDetails>> dictionary = new Dictionary<string, List<MovieDetails>>();
if (!string.IsNullOrEmpty(filePath))
{
using (StreamReader sdr = new StreamReader(filePath))
{
Line = sdr.ReadLine();
while ((Line = sdr.ReadLine()) != null)
{
List<MovieDetails> movieDetailsList;
string[] input = Line.Split(',');
MovieDetails movieDetails = new MovieDetails()
{
MovieName = input[0],
Genre = input[1].ToUpper(),
LeadStudio = input[2],
Audience = int.Parse(input[3]),
Profitability = float.Parse(input[4]),
RottenTomatoesPercent = int.Parse(input[5]),
WorldwideGross = decimal.Parse(input[6], System.Globalization.NumberStyles.Currency),
Year = int.Parse(input[7])
};
if (dictionary.TryGetValue(movieDetails.Genre, out movieDetailsList))
{
movieDetailsList.Add(movieDetails);
}
else
{
dictionary.Add(movieDetails.Genre, new List<MovieDetails>() { movieDetails });
}
}
return dictionary;
}
}
else
{
return null;
}
}
catch (Exception)
{
throw;
}
}
}
public class MovieDetails
{
public string MovieName { get; set; }
public string Genre { get; set; }
public string LeadStudio { get; set; }
public int Audience { get; set; }
public float Profitability { get; set; }
public int RottenTomatoesPercent { get; set; }
public decimal WorldwideGross { get; set; }
public int Year { get; set; }
}
Index selectivity (the ratio of the average number of records in each "bucket" to the total number of records) is key to designing your program efficiently.
In your sample CSV genre has poor selectivity, because more than half of all movies in the sample are comedies; you would be better off walking the entire list under the circumstances. Same goes for the year: with only four distinct years available, you would be returning a quarter of all records in every search, so you might as well walk the list.
You can get better selectivity by introducing a composite index, i.e. an index with keys combining two or more columns (say, genre + year). This index has better selectivity: you would be placing ten or fewer movies in each "bucket", so when the query asks for a combination, you'd go for an index.
Note that another possibility to run queries faster is to keep your records in sorted order on one of the search keys. This would let you handle queries with a binary search and a forward scan, rather than a full scan of the data. Picking the sort column requires statistics on your search queries, because it may negatively impact queries that are based on other columns.
Related
Im trying to generate a random ID number so when the user clicks a button it redirects to a random artist. I was able to do it and it works quite well with the code below.
var artists = _context.Artists;
var totalArtists = artists.Count();
Random randomArtistNumber = new Random();
int randomArtistID = randomArtistNumber.Next(1, totalArtists + 1);
if (button.Equals("btnArtist"))
{
return RedirectToAction("Details", "ArtistModels", new { id = randomArtistID.ToString() });
}
The problem here is that if the user deletes an artist then im going to have an ID number that no longer exists in the random list created above. How would I go about creating a List<> of all ArtistID's in the Artist Table? It would be much better to just pick a random number from within a list of active ID's
ArtistModel below
public class ArtistModel
{
[Key]
public int ArtistID { get; set; }
public string ArtistName { get; set; }
public string CountryOfOrigin { get; set; }
public int SubgenreID { get; set; }
public SubgenreModel Subgenre { get; set; }
public ICollection<AlbumModel> Albums { get; set; }
}
I'm guessing you are using Entity Framework?
You should be able to get a list of ids from your artists by doing something like this.
var artistIds = artists.Select(a => a.Id).ToList();
To do that correctly you'd need to ensure that you were always loading all artists from the context.
Might be better if you random function had a limit, like maybe randomly select from the first 200 artists or something.
Since you already have the artists, what you can actually do is get a random artist, it wouldnt be much different from what you have
Something like:
int randomArtistIndex = randomArtistNumber.Next(1, totalArtists);
var artist = artists[randomArtistIndex]
and then
return RedirectToAction("Details", "ArtistModels", new { id = artist.ArtistID.ToString() });
I am looking for a solution and was hoping any of you guys can help me with this. I have 3 CSV files with a product code and a price. After I imported the first CSV file I want the program to see the duplicate product codes and only put the price of that product code next to the other price of the previous CSV file.
Images will probably make it more clear about what I am talking about.
Example of what a CSV file looks like:
After 1 csv import:
With 3 csv imports
Its hard to help you if we can't see any code. I tried to come up with something. It may help you to get an idea how it could work.
The .csv you are using has different values for the price. Sometimes it is:
x,yz and sometimes you got x.yz. I would recommend to use '.' because you can directly parse it into a float (price). As well delete the € out of the price value. A price in a database or in other data files like a .csv should not be a string. If you decide to leave it just remove the first char with:
yourString.Remove(0,1);
First of all here some info about the files and locations I have:
string path = #"C:\__ER2S3_Abteilung\StackOverFlow\Data_product_price\";
string[] files = new string[]
{
"first_list.csv",
"second_list.csv",
"third_list.csv"
};
I created a class that looks like this:
public class Product
{
public string ProductCode { get; set; }
public float? FirstPrice { get; set; } = null;
public float? SecondPrice { get; set; } = null;
public float? ThirdPrice { get; set; } = null;
public float AvgPrice { get; set; }
}
I created a list that will save all the products you read.
List<Product> products = new List<Product>();
Then read files and add all the products, but check if the products already exist in the list when adding them:
foreach(string item in files)
{
string fullpath = path + item;
using (StreamReader reader = new StreamReader(fullpath))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
try
{
float.Parse(line.Split('|')[1]);
}
catch (Exception ex)
{
continue;
}
string productname = line.Split('|')[0];
float price = float.Parse(line.Split('|')[1]);
if (!products.Exists(e => e.ProductCode == productname))
{
products.Add(new Product() { ProductCode = productname , FirstPrice = price });
}
else
{
foreach(Product _product in products)
{
if(_product.ProductCode == productname)
{
if(_product.SecondPrice == null)
{
_product.SecondPrice = price;
}
else
{
_product.ThirdPrice = price;
}
}
}
}
}
}
}
Bind the list of products to your DataGridview. The columns of the DataGridView are the properties in the object Product.
For average calculation add following:
foreach(Product product in products)
{
product.AvgPrice = (float)((product.FirstPrice + product.SecondPrice + product.ThirdPrice) / 3);
}
Don't forget to add the INotifyPorpertyChanged interface and use that. This will auto update changes made to the list automatically in your UI.
It is not the most beautiful way but it works.
I tested it. I get a list of Products with all the values from the different .csv files. And the calculated AvgValue.
I've developed an UWP app where I use a SQLite database to store datas that are synced.
Among these data, there a lot of tables that contain translated data. For example, for various cases, we have:
a "businness" table, which contains the id that is really used in the database
a "translation" table, which contains transaltion for the business table
The models of the "business" tables are defined like this:
public class Failure_Type : BasePoco
{
[PrimaryKey, NotNull]
public int failure_id { get; set; }
public int? function_type_id { get; set; }
public int? component_category_id { get; set; }
[MaxLength(200), NotNull]
public string description { get; set; }
public DateTime? disable_date { get; set; }
[Ignore]
public string _descriptionTr { get; set; }
}
The field "description" stores the english/default description, and the "_descriptionTr" field will store the translated description.
The models of the "translation" tables are defined like this:
public class Failure_Type_Translation : BasePoco
{
[PrimaryKey, NotNull]
public int failure_type_translation_id { get; set; }
public int? failure_type_id { get; set; }
[MaxLength(2)]
public string language { get; set; }
[MaxLength(200), NotNull]
public string description { get; set; }
}
The field "failure_type_id" is related to the business table, the other fields store the language code and the related translation.
So, after syncing datas in the SQLite database, I refresh the "translated" datas in the app and this can take a long moment. The load of the the 2 tables from the SQLite is very quickly, but the update of the "_descriptionTr" field can be very slow:
var failureType = ServiceLocator.Current.GetInstance<IRepository>().GetAll<Failure_Type>();
var failureTypeTranslations = ServiceLocator.Current.GetInstance<IRepository>().GetAll<Failure_Type_Translation>();
FailureType = new ObservableCollection<Failure_Type>();
foreach (var ft in failureType)
{
var ftt = failureTypeTranslations.FirstOrDefault(i => i.failure_type_id == ft.failure_id && i.language.ToLower().Equals(lang));
if (ftt != null)
ft._descriptionTr = ftt.description;
else
ft._descriptionTr = ft.description;
FailureType.Add(ft);
}
Is there a better way for doing this?
How could I optimize it?
Edit :
the "business" table contains 550 rows
the "translation" table contains 3500 rows
the duration of the loop is nearly 1 minute
A couple of suggestions:
Create the observable collection at once ...
FailureType = new ObservableCollection<Failure_Type>(failureType);
... so the individual additions don't fire notifications. Now use FailureType in the loop.
Instead of fetching all translations, filter them by lang:
var failureTypeTranslations = ServiceLocator.Current.GetInstance<IRepository>()
.GetAll<Failure_Type_Translation>()
.Where(l => i.language == lang);
Create a dictionary for lookup of known translations:
var dict = failureTypeTranslations.ToDictionary(ftt => ftt.failure_id);
foreach (var ft in FailureType)
{
Failure_Type_Translation ftt;
if (dict.TryGetValue(ft.failure_id, out ftt)
ft._descriptionTr = ftt.description;
else
ft._descriptionTr = ft.description;
}
I think that esp. the part failureTypeTranslations.FirstOrDefault kills performance. The query is executed for each iteration of the loop.
I have this class:
public class Course
{
public String Name { set; get; }
public int Code { set; get; }
public List<String> PreRequireCources = new List<String>();
}
and PreRequireCources list is filled by a listbox like this:
Course cu = new Course();
cu.Name = txtName.Text;
cu.Code = Convert.ToInt32(txtCode.Text);
cu.PreRequireCources= lstPreRequirsist.Items
.Cast<string>().ToList();
and this is my course table:
If I do it like this I get an error, because PreRequireCources is a List but the database column is a ntext:
var db = new LinqDataContext();
db.Cources.InsertOnSubmit(cc);
So how I can save this list to my database? Are there any better ways to save this list to database for every student?
Thanks.
You are trying to insert a list of values into a column which expects a single text value. So you can join all strings in the list into a single string
cu.PreRequireCources= String.join(", ",lstPreRequirsist.Items
.Cast<string>().ToList());
Replace "," with your delimiter.
There are two options:
1. Concat the strings to one string (not so good)
You could change your Course class to this:
public class Course
{
public String Name { set; get; }
public int Code { set; get; }
public String PreRequireCources = string.Empty;
}
and fill it from the listbox like this:
cu.PreRequireCources = string.Join(",", lstPreRequirsist.Items.
Cast<string>());
This way you would store the courses as a , separated list in the PreRequire column.
Note that this is not a good way to store this kind of data. There seems to be a mxn relationship between a Course and the PreRequireCources, so...
2. Create a cross table to define the mxn relationship
The better way to solve this is to create a table named something like PreRequiredCoursesToCourses. Let's say you have these courses
Code Name PreRequireCourses
1 Course1 Course2, Course3
2 Course2 Course3
3 Course3
Then remove the PreRequireCourses from that table and build a cross table instead:
CourseCode PreRequiredCode
=============================
1 2
1 3
2 3
This makes your data design more efficient and flexible. Storing strings that you have to parse again to find out which courses are required is not a good idea.
If you want to request the information later, you can now use a JOIN to get all the courses that are pre-required for a course.
You could have your List as a not mapped property, and have another string column that gets mapped.
Then you can serialize your list, for example, using Newtonsoft.Json:
public class Course
{
public String Name { set; get; }
public int Code { set; get; }
// Mapped to column "PreRequire"
public String PreRequire {
get {
return JsonConvert.SerializeObject(PreRequireCources);
}
set {
PreRequireCources = JsonConvert.DeserializeObject<List<string>>(value);
}
}
// Not mapped to any column
public List<String> PreRequireCources = new List<String>();
}
First of all I'm sorry if you feel this question has been raised before, but I can't seem to wrap my mind around this one, although it's rly not the hardest thing to do..
Basically I have a query result from sql which holds several rows, existing out of :
id, parentid, name, description, level
level is the depth of the item viewed as a tree structure represented as a positive integer.
now I would love to parse/convert this flat data into a "List<Item> mySqlData" where Item consist like the following class definition
public class Item
{
public string Id { get; set; }
public string ParentId { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public string List<Item> { get; set; }
}
can anybody give me some example code, it's probably going to be something in the lines of recursive iteration trough the list while adding the items at their place..
thanks in advance
Assuming you want to build the tree, and don't get the data out of order, you should be able to maintain a lookup as you go, i.e.
var idLookup = new Dictionary<int, Item>();
var roots = new List<Item>();
foreach([row]) {
Item newRow = [read basic row];
int? parentId = [read parentid]
Item parent;
if(parentId != null && idLookup.TryGetValue(parentId.Value, out parent)) {
parent.Items.Add(newRow);
} else {
roots.Add(newRow);
}
idLookup.Add(newRow.Id, newRow);
}