Validation of object within a list in Parallel C# - c#

I'm implementing an import functionality, wherein I need to read the data from the user uploaded CSV or Excel file and run a few validations and sanitize the data before writing the data to DB.
I'm able to get the data from the file to a list of objects, with the following structure:
public class Order
{
public string Sku { get; set; }
public decimal Cost { get; set; }
public DateTime OrderFulfillmentStartDate { get; set; }
public DateTime OrderFulfillmentEndDate { get; set; }
public string ValidationErrors{ get; set; }
}
Following are the validations that need to happen within the objects in the list, a few examples below:
No two orders with the same SKU and OrderFulfillmentStartDate, OrderFulfillmentEndDate are allowed.
No two orders with the same SKU and overlapping OrderFulfillmentStartDate, OrderFulfillmentEndDate are allowed.
etc.
The way I implemented it:
During the first encounter of a distinct record (passing all the validations AND "ValidationErrors" == string.empty), I'm adding the record to a temp list.
During the next iteration, I'm validating the currently processing record with the record present in the temp list, if the validation fails, I'm populating the "ValidationErrors" field and adding the temp list.
E.g:
Now coming to the crux of the issue:
The size of the data can be around One Million rows.
When I implemented the validations sequentially using foreach loop, the validation process is taking more than 8 hours.
Having said that, I believe doing the validation in parallel would cut down the time needed drastically.
I did try implementing the logic using Parallel.ForEach and Partitioner concept. The processing did quicken up, but I'm not sure, how I can keep a temp list that can be used/updated by multiple threads in the ForEach loop, during the validation.
Is there is a better or quicker approach to achieve what I'm trying to do here? Please do let me know about it.
Thank You!

Related

What is the most efficient method of detecting whether an object's content has changed and update the object? C#

The Problem
Often controller classes (MVC, MVVM, MVP) that manipulate views have access to a multitude of services. The purpose of one of these services is to update the controller classes with data that is pertinent to the controller.
Take this class for example:
public sealed class RestaurantInformation
{
public IEnumerable<Networking.Models.ServerModels.Restaurant> NearestRestaurants { get; internal set; }
public IEnumerable<Networking.Models.ServerModels.Restaurant> NewestRestaurants { get; internal set; }
public IEnumerable<Networking.Models.ServerModels.Category> Categories { get; internal set; }
}
Whenever the service receives updated information from the network regarding Categories, NewestRestaurants or NearestRestaurants, it packages all of the data into an object which has the class type RestaurantInformation and sends it to the controller class to pass to the view.
I decided to put my C grade in my art GCSE to good use and construct diagrams to aid your understanding of my problem. (Apologies for what you are about to see.)
As you can now see, the flow is as follows:
The view loads which in turn calls the RestaurantViewControl.
The RestaurantViewControl then calls the RestaurantService to retrieve the new categories from the API.
The API returns the new categories to the RestaurantService. (You can see here the restaurant service now has a list that contains B).
The RestaurantService then notify's the RestaurantViewControl using the class above with the new list of categories!
We must now update the list of categories in the RestaurantViewControl in the most efficient way possible with the new items.
I am currently clearing the categories list and then replacing all the values with the new list. The two things I want to know are:
What is the most efficient way to detect a change in the Categories List object?
What is the most efficient way to update the categories list which may still contain objects that are completely valid in that list.
Seems like you have a straight forward issue. You will have a services layer that calls when you show the restaurant list page.
So your collectionView/listView just displays the list of items in the view cell based on that data. One example https://almirvuk.blogspot.com/2019/07/lets-play-with-collectionview-layouts.html?m=1
Usually you’ll just do a check for changes on the first time you visit the page, pull to refresh, or if you set up caching-after a set time when the cache expires.

ASP.NET MVC WEB API

New to MVC. I did the tutorial # [http://www.asp.net/web-api/overview/getting-started-with-aspnet-web-api/build-a-single-page-application-(spa)-with-aspnet-web-api-and-angularjs] and from this you produce a question and answer website. If I wanted to maintain progress i.e. keep a count of the number of questions correctly answered, do I need to calculate this value from retrieving the db.TriviaAnswers object or do I need to add a Count property to the TriviaAnswer class or do I need a separate variable then how do I maintain state between requests? Like ViewBag is not available in the
public async Task<IHttpActionResult> Post(TriviaAnswer answer){...}
method.
OPTION 1 as suggested below:
namespace GeekQuiz.Models
{
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;
using Newtonsoft.Json;
public class TriviaResults
{
[Required, Key, Column(Order=1)]
public string UserId { get; set; }
[Required, Key, Column(Order=0)]
public virtual int QuestionId { get; set; }
}
}
This code throws an InvalidOperationException in the method:
private async Task<TriviaQuestion> NextQuestionAsync(string userId)
on the first line of code.
lastQuestionId = ...
I went over this tutorial a few months ago.
option 1: If you want to track progress I assume you mean progress per user, then I would advice you to add a table to the db which states saves the users ids and the ids of questions which were correctly answered - that's in case you want to save this as a persistent data and per user.
option 2: If you want the same thing, save the data per user but only for this session, you can save the data in the session variable as a dictionary<userid, list<questionid>>.
One thing you should notice is that those question repeat in an endless loop, so you might want to change that.
In both options when you need to know the count u can just go to the table or dictionary and get the number of correct answers.
I hope that answers your question.
To use the session var:
Session["name"] = value;
Session.Remove("name");

DynamoDB and .NET Object Persistence Model, using 'ADD' values instead of overwrite

I have several DynamoDB tables that will act as aggregate data stores (we are 'rolling up' reports on the fly). I was hoping to use the .NET: Object Persistence Model (where you build classes that have annotations on them). The problem is the DynamoDBContext object only seems to have a 'Save' method and not a 'Save and Add values' method. Because the time between retrieving an object from Dynamo and the time to write to that row again could be larger than a trivial amount, and more than one thread could be attempting to increment, I don't want the increment to be done in the .NET code. Instead I want the .ADD AttributeAction. But I'm not sure if you can specify an attribute action with the Object Persistence Model. Does anyone know if that's possible?
[DynamoDBTable("my_table")]
public class MyRecord
{
[DynamoDBHashKey(AttributeName = "my_id")]
public string MyID{ get; set; }
/// <summary>
/// Hash of the Region and Country fields for unique data lookup from DynamoDB
/// </summary>
[DynamoDBRangeKey(AttributeName = "location")]
public string Location { get; set; }
[DynamoDBProperty("my_count")]
public int MyCount{ get; set; }
Above is a sample object. The idea is that MyID gets several 'counts' which represent user actions. I don't want to have to get mycount then add 1 in .NET then re-push. I'd rather run the 'Add' command and always send '1' to mycount and have dynamo do the math to guarantee correctness.
Since I'm not finding a ton of resources on this, I've decided to write my own extension method for this. It's not perfect in that you can't divine the DBClient object from the context because it's not public, so you have to pass it in.
https://gist.github.com/edgiardina/9815520
However, I'll leave this question unanswered since I don't know if there's an easier way to execute this.

DB Design, To normalize or not?

Using MS visual studio 2012, Asp.net C# MVC 4, Entity Framework, NopCommerce(cms).
Hi guys I have Database Design Query, its actual confused me, normally I have no problems with DBs.
However since transitioning over to the Code First approach I asked my self this question...
I am Creating a new plugin for my NopCommerce CMS website, This plugin shall be a ImageGallery Plugin.
I would like the Data layer to store an
ID,
Name,
LargeImg
SmallImg,
Urlimg
But I also want to realize the functionality of this Plugin, The user should be able to upload any image and then Associate this image to a section of there choosing, What i mean by this is Img to a blog post, or Img to news post, Img to a product post OR all of the 3.
Now these three examples are the only ones i can think of, but as you have guessed this may change depending on additional content types.
Now Instantly I thought, Easy we simply create a field called.....Type? or ContentType?
this field will then store the "type" of image association, whether it is a Blog, news or product item.
At which point i thought of "but what if an image has multiple associations?
To which brings me to the question, in this situation, Should i:
Create Separate Columns for each "content Type" (non Normalized)
Create 1 Column Called "content-type" (normalized)
Create a completely Separate table Called "content-types" and use relation
For some reason I'm stuck, I don't normally draw a blank on DB design and implementation.
The code below is my Domain Class in my plugin, i went for number 2 but im not sure to continue down this road.
using Nop.Core;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Hroc.Plugin.Misc.ImageGallery.Domain
{
public class ImageItem : BaseEntity
{
public virtual int ImageID { get; set; }
public virtual string Name { get; set; }
public virtual byte[] Largeimg { get; set; }
public virtual byte[] Smallimg { get; set; }
public virtual string URLimg { get; set; }
public virtual string Typeimg { get; set; }
public virtual int LinkID { get; set; }
}
}
hopefully you guys can point out the correct way to implement this, thanks!
With everything there is a trade-off
Benefits of normalization in your case:
Extensibility - adding another content type requires no structure/class change
Smaller tables (with variable-length data the difference may not be significant)
Drawbacks:
Querying - if you need to pull multiple types in one query you'll need to de-normalize.
Integrity Overhead - possibility of orphaned data if not managed properly
If I were designing this feature I would go with Option 3 - normalizing the content types has other advantages such as being able to use that table for drop-down lists.

C# Sort object with listcollection member

I have an MVC web app where users upload a text file and I parse it out.
The requirement just changed and they will be uploading multiple files of the same kind now. I parse a single file by sending a file-path to the method below, ReadParts which opens a stream and calls the method parseReplicateBlock to retrieve desired fields. For multiple files I could read all the files into one big stream but I am afraid it could exceed the buffer limit etc.
So I am thinking to parse file by file and populate results into an object. My requirement then, is to sort the records based on a date field.
I just need some help in how to write this method ReadLogFile in a better way, espceially for sorting based on initialtionDate and initiationTime. I want to find the minimum record based on initiationDate and Time and then do some other logic.
The problem is if I sort the list member within the object, I would loose positiong of the other records.
You appear to be storing each field of the record in a separate collection within LogFile. This seems a very strange way to store your data.
If you sort one of these collections, then of course it will bear no relationship to the other fields any longer since they are unrelated. There are huge areas for bugs too if you are relying on all the collections tallying up (eg if a field is missing from one of the parsed records)
Instead you should be have a class that represents a SINGLE record, and then Logfile has a SINGLE collection of these records. eg:
public class ReplicateBlock
{
public string ReplicateId { get; set; }
public string AssayNumber { get; set; }
public DateTime InitiationDate { get; set; }
//etc
}
public class LogFile
{
public List<ReplicateBlock> ReplicateBlocks = new List<ReplicateBlock>();
}
I have to say that your code is very difficult to follow. The fact that all your functions are static makes me think that you're not particularly familiar with object oriented programming. I would suggest getting a good book on the subject.

Categories