validating a excel file in a web application - c#

I've been tasked with allowing a user to upload an excel file and validating the data within the file.
I've researched ways on how to validate the data before the user submits the excel file on the client side( in javascript ), but it looks like you can't do this.
Is seems that the only way is to post this file to the server and do the validation on the server.
This seems like it could be time intensive. As of right now we don't really know how big this files can be, so we really don't want to bog down our server.
Is there any good way to do this?
Also, I am not saving the excel file on the server I just care about the data in it.
This is a ASP .NET application.

There are tools to parse excel files in javascript, so it is doable, but you need to provide more info. What makes a valid file? Is there a particular format, or do you just want to see if it will open in excel?

This sort of validation definitely should be done server-side. Even if there is client-side validation, you can't trust it or assume that it ran properly and would want to re-validate server-side anyway.
If the file is potentially large and validation could take a long time (more than a few moments) then you'll want to perform this sort of task asynchronously. As a general architectural guideline, in the past I've done this as follows:
Have two database tables to hold the spreadsheet data. One for the data itself (record-level) and one for the overall status of the uploaded spreadsheet (file-level). Something like this:
Spreadsheets
----------
ID
User
Status
DateUploaded
DateProcessed
etc.
SpreadsheetRecords
----------
ID
SpreadsheetID
SomeValueFromTheData
TheNextValueFromTheData
etc.
ValidationMessage
User uploads the spreadsheet, which is converted to strongly-typed data and persisted somewhere. This happens in-line immediately in the handling of the request in the web application (synchronously). If this fails, the file itself is invalid to the point that it can't be run through the business logic and an error is displayed to the user. You can use any number of techniques/libraries/tools to convert spreadsheets into in-memory objects, depending on the format of the spreadsheet. (A Google search for "convert OpenXML to DataTable in C#" might be a place to start, for example.)
User is presented with a response indicating that the upload was successful and processing will begin. At this point all of the "records" in the spreadsheet are saved in a table and linked to a record in another table which maintains the status of the processing, the user associated with it, etc. Any time the user re-loads the page they just see whatever the current status of the processing is. This part depends heavily on the UI. What I had before was a page where the user can check the status of their previous uploads. They can upload in bulk and the system will just process each file in turn, for example.
A background process (Windows Service in my case) continually polls the Spreadsheets table for new records which are in a "pending" status and "processes" that data. This is where the "validation" takes place, which could mean just about anything depending on the business rules. For each record in SpreadsheetRecords perform the business logic and store the result (for example, if there's an error put that error in ValidationMessage).
When processing completes for all of the records in that data, update the record in Spreadsheets with the status. (Passed, Failed, etc.) At this point you can notify the user that processing as completed, whether via email or some notification on the website (something like Facebook's notification bar, for example).
When the user checks the page again to see the status of the processing, they can see the result of the processing including any specific error messages about specific records in the data.
This is all very high-level, of course. There are a lot of assumptions here that would need to be tweaked for your own setup/architecture. The main point is the asynchronous nature of the whole thing. It's entirely possible to do all of this synchronously as well, which just eliminates the steps of telling the user that it's begun and notifying them when it's complete. But synchronous processing for something like this runs the risk of timing out (or at least presenting a bad user experience) for long-running tasks.

Related

Call method inside Workflow's IF ChartFlow

I am using Windows Workflow Foundation 4and as part of more complicated scenario I have those two states:
The idea is that the users are allowed to upload files which is part of the whole workflow. There may be several reason which my lead to the fact that an uploaded file can not be processed immediately. One of the reasons may be that currently there is a file uploaded from a certain user which is being processed so every other file uploaded during the processing should be in state WaitingProcessing. However when I enter the WaitingProcessing state I need to check this. In order to do that I have to implement something like this:
where generally the HasFileToProcess is a function which will call a stored procedure from the database to check if currently there is a file for this user which is in state Processing.
Almost all parts of the tasks are clear, one thing that I don't know how to do is how to call e function inside the condition field. In fact I have almost zero experience with Windows Workflow at all so I'm not even sure that this is the way to go so as a sub-question I would appreciate if someone knows and show me if there is better way to implement this kind of logic.

ASP.Net MVC Long Running Process

I have a requirement to produce a report screen for different financial periods. As this is quite a large data set with a lot of rules the process could take a long time to run (well over an hour for some of the reports to return).
What is the best way of handling this scenario inside MVC?
I am concerned about:
screen locking
performance
usability
the request timing out
Those are indeed valid concerns.
As some of the commenters have already pointed out: if the reports do not depend on input from the user, then you might want to generate the reports beforehand, say, on a nightly basis.
On the other hand, if the reports do depend on input from the user, you can circumvent your concerns in a number of ways, but you should at least split the operation into multiple steps:
Have a request from the browser kick off the process of generating the report. You could start a new thread and tell it to generate the report, or you could put a "Create report" message on a queue and have a service consume messages and generate reports. Whatever you do, make sure this first request finishes quickly. It should return some kind of identifier identifying the task just started. At this time, you can inform the user that the system is processing the request.
Use Ajax to repeated poll the server for completion of the report using the given identifier. Preferably, the process generating the report should report its progress and this information should be provided to the user via the Ajax polling. If you want to get fancy, you could use SignalR to notify the browser of progress.
Once the report is ready, return a link to the user where he/she can access the report.
Depending on how you implement this, the user may be able to close the browser, have a sip of coffee and come back to a completed report.
In case your app is running on Windows Server with IIS your ASP.Net code can create a record in db table which will mean that report should be created.
Then you can use Windows Service or Console App which might be running on the same server and constantly checking if there any new fields in the table. This Service would create a report and during creation it should update table field to indicate progress.
Your ASP.net page might be displaying progress bar, getting progress indication from db using ajax requests or simply refreshing the page every several seconds.
If you are running on Windows Azure cloud you might use WebWorker instead of Windows Service
For screen locking on your page you may use jquery Block-UI library

Howto read this large text file? Memory Mapped File?

I am in the design phase of a simple tool I want to write where I need to read large log files. To give you guys some context I will first explain you something about it.
The log files I need to read consists of log entries which always consist of the following 3-line format:
statistics : <some data which is more of less of the same length about 100 chars>
request : <some xml string which can be small (10KB) or big (25MB) and anything in between>
response : <ditto>
The log files can be about 100-600MB of size which means a lot of log entries. Now these log entries can have a relation with each other, for this I need to start reading the file from the end to the beginning. These relationship can be deduced from the statistics line.
I want to use the info in the statistics line to build up some datagrid which the users can use to search through the data and do some filtering operations. Now I don't want to load the request / response lines into memory until the user actually needs it. In addition I want to keep the memory load small by limiting the maximum of loaded request/response entries.
So I think I need to save the offsets of the statistics line when I am parsing the file for the first time and creating a index of statistics. Then when the user clicks on some statistic which is a element of a log entry then I read the request / response from the file by using this offset. I can then hold it some memory pool which takes care that there are not to much loaded request / response entries (see earlier req).
The problem is that I don't know how often the user is going to need the request/response data. It could be a lot it could be a few times. In addition the log file could be loaded from a network share.
The question I have is:
Is this a scenario when you should use a memory mapped file because of the fact there could be a lot of read operations? Or is it better to use a plain filestream. BTW. I don't need write operations to the log file at this stage but it could be in the future!
If you have other tips or see flaws in my thinking so far please let me know as well. I am open for any approach.
Update:
To clarify some more:
The tool itself has to do the parsing when the user loads a log file from a drive or network share.
The tool will be written as WinForms application.
The user can export a made selection of log entries. At this moment the format of this export is unknown (binary, file db, textfile). This export can be imported by the application itself which then only shows the selection made by the user.
You're talking about some stored data that has some defined relationships between actual entries... Maybe it's just me, but this scenario just calls for some kind of a relational database. I'd suggest to consider some portable db, like SQL Server CE for instance. It'll make your life much easier and provide exactly the functionality you need. If you use db instead, you can query exactly the data you need, without ever needing to handle large files like this.
If you're sending the request/response chunk over the network, the network send() time is likely to be so much greater than the difference between seek()/read() and using memmap that it won't matter. To really make this scale, a simple solution is to just breakup the file into many files, one for each chunk you want to serve (since the "request" can be up to 25 MB). Then your HTTP server will send that chunk as effeciently as possible (perhaps even using zerocopy, depending on your webserver). If you have many small "request" chunks, and only a few giant ones, you could break-out only the ones past a certain threshold.
I don't disagree with with answer from walther. I would go db or all memory.
Why are you so concerned about saving memory as 600 MB is not that much. Are you going to be running on machines with less than 2 GB of memory?
Load into a dictionary with statistics as a key and the value a class with two properties - request and response. Dictionary is fast. LINQ is powerful and fast.

Real Time data logging design Approach?

I am working on a real time trace logging application in wpf.I am reading logged data from a UDP port and converting it in terms of my modal class.The thing i am worried is, consider if the user keeps the application open for a long period of time,the application will be using a large amount of memory.I display the log information in a scrollable list,so the user has the provision to scroll up to and get to a previous log.I'm looking for a design approach so that i can deliver the best results with the optimal use of memory.So which is the best design approach for this kind of application ?
"Real Time" mean as soon as data is available Application should pick up it and display. No other way.
You can consider something like cleanup of the already previewed logging information if this is appropriate from user perspectives and load historical data on demand.
Also one of the possible solutions is to optimize LogInformationdata model so entities which you are displaying would require less memory, this could be significant improvement considering that a lot of entries are displayed and each single saved byte may result in MegaBytes of saved memory, so please share some code of entities which are bound to UI and indicate which fields/properties really need to be displayed to end user
For some kind of data you can implement Lazy Loading and request data from DB/file system on demand. For instance when an user opening Details form for a particular LogInfo entry in the UI list you are requesting advanced information like full description and so on, so you do not need to keep it always in memory whilst user do not request it by opening "More Details" form
If DB calls is high cost for your Applicaiton you can store some information on the file system in serialized format and load it On Demand in Lazy Loading manner.
Really it is hard to suggest something concrete without of knowledge of the Use Cases and standard workflows. So please provide more details how this applicaiton is used by an user so more ideas coudl come in.
One idea that might work is to add a "listener" that would see if the program was altered, or after lets say 5 minutes of seeming idle a popup would ask if you want to continue. It would then reconnect once the user clicks "ok"
Would that work? I am assuming that the memory use is on the log server.
It seems like your have two much data to display. You can use UI Virtualization. VirtualizingStackPanel can do what you want by not creating UI elements for all your log lines untill the user scrolls to it. A example would be too long for stackoverflow. There are plenty of examples on the web.
http://bea.stollnitz.com/blog/?p=338
On the other hand if you are memory requirements are too high just because there is too much log data. Consider writing it to a database on disk.

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

Categories