I have the goal of uploading a Products CSV of ~3000 records to my e-commerce site. I want to utilise the REST API that my e-comm platform provides so I have something I can re-use and build upon for future sites that I may create.
My main issue that I am having trouble working through is:
- System.Threading.ThreadAbortException
Which I can only attribute to how long it takes to process through all 3K records via a POST request. My code:
public ActionResult WriteProductsFromFile()
{
string fileNameIN = "19107.txt";
string fileNameOUT = "19107_output.txt";
string jsonUrl = $"/api/products";
List<string> ls = new List<string>();
var engine = new FileHelperAsyncEngine<Prod1>();
using (engine.BeginReadFile(fileNameIN))
{
foreach (Prod1 prod in engine)
{
outputProduct output = new outputProduct();
if (!string.IsNullOrEmpty(prod.name))
{
output.product.name = prod.name;
string productJson = JsonConvert.SerializeObject(output);
ls.Add(productJson);
}
}
}
foreach (String s in ls)
nopApiClient.Post(jsonUrl, s);
return RedirectToAction("GetProducts");
}
}
Since I'm new to web-coding, am I going about this the wrong way? Is there a preferred way to bulk-upload that I haven't come across?
I've attempted to use the TaskCreationOptions.LongRunning flag, which helps the cause slightly but doesn't get me anywhere near my goal.
Web and api controller actions are not meant to do long running tasks - besides locking up the UI/thread, you will be introducing a series of opportunities for failure that you will have little recourse in recovering from.
But it's not all bad you have a lot of options here, there is a lot of literature on async/cloud architecture - which explains how to deal with files and these sorts of scenarios.
What you want to do is disconnect the processing of your file from the API request (in your application not the 3rd party)
It will take a little more work but will ultimately create a more reliable application.
Step 1:
Drop the file immediately to disk - I see you have the file on DISK already not sure how it gets there but either way it will work out the same.
Step 2:
Use a process running as
- a console app (easiest)
- a service (requires some sort of install/uninstall of the service)
- or even a thread in your web app (but you will struggle to know when it fails)
Which ever way you choose, the process will watch a directory for file changes, when there is a change it will kick off your method to happily process the file as you like.
Check out the FileSystemWatchers here is a basic example: https://www.dotnetperls.com/filesystemwatcher
Additionally:
If you are interested in running a thread in your Api/Web app, take a look at https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx for some options.
You don't have to use a FileSystemWatcher of course, you could trigger via a flag in a DB - that is being checked periodically, or a system event.
Related
I am currently working on a project to build an integration between an existing ASP.Net MVC website and a file hosting service my company is using. The typical use case is:
A user requests one or more files
The controller makes one call per file to the file host API
The file host returns the file data to the controller
The controller returns a file result
The hosting service can handle concurrent calls, and I've found that executing each API call within a task (see example below) leads to fairly drastic improvements.
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
var tasks = identifiers.Select(x => RetrieveDocument(results, x)).ToArray();
Task.WaitAll(tasks);
}
private Task RetrieveDocument(List<FileHostResult> results, DocumentIdentifier x)
{
return Task.Run(() =>
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
lock (results)
{
results.Add(result);
}
});
}
My question is whether or not there is a better way of doing this, or if there are any potential pitfalls I might run into? (eg. locking server resources, etc).
EDIT 1: I didn't post the code for GetFileHostResultFromFileHost because I don't really have any access to change it. Its basically a method call implemented in a library I cant change.
EDIT 2: To clarify. My main concern is to avoid harming the current user experience on the site. To that end I want to make sure that running tasks concurrently out of an ASP.net mvc isn't going to lock up the site.
You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = query.ToList().Wait();
That's it. It properly schedules the code on the optimum number of threads.
If you want awaitable code then you can do this:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = await query.ToList();
It's really very simple and easy to code.
NuGet "System.Reactive" and then add using System.Reactive.Linq; to your code.
It is hard to give great advice without seeing the rest of the source code. But based on what I can see I'd suggest an approach like:
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
results.AddRange(identifiers.AsParallel().Select(x => RetrieveDocument(x)));
}
private FileHostResult RetrieveDocument(DocumentIdentifier x)
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
return result;
}
The advantages of this approach:
No explicit use of Task.Run - let AsParallel take care of that for you.
No need for locking the results list - let AsParallel and Select take care of that for you
You may also wish to increase the maximum number of connections you have access to.
Being honest though, I think you should look at approaches that don't require new Tasks at all - likely by using Async http download calls which you can run in parallel without the overhead of a thread.
Im trying to implement a performance monitoring tool, I want to monitor basic things such as Memory and CPU.
I am attempting to do so by using Performance Counters as I believe this is what Task Manager is using behind the scenes too. I have no idea how Task Manager is able to do this however as to me it seems to take a VERY long time to retrieve process data using this method:
class Program
{
static void Main(string[] args)
{
while (true)
{
var pcs = Process.GetProcesses()
.Select(p => new PerformanceCounter("Process", "Working Set - Private", p.ProcessName));
var sw = Stopwatch.StartNew();
foreach (var pc in pcs)
pc.NextValue();
Console.WriteLine($"Time taken to read {pcs.Count()} performance counters: {sw.ElapsedMilliseconds}ms");
Thread.Sleep(1000);
}
}
}
Has anyone got any suggestions on how to do this or how even Task Manager or Process Explorer is able to do this?
How does Task Manager do it?
he used calls to ZwQuerySystemInformation, ZwQueryInformationProcess, ZwQueryInformationThread ..
Task Manager maintain database of active processes and periodically update this info by calling ZwQuerySystemInformation(SystemProcessInformation,) - so got array of SYSTEM_PROCESS_INFORMATION on exit.
add new entries if found new process, yet not in DB, remove entries for died processes, update info for lived
SYSTEM_PROCESS_INFORMATION already containing a lot information of process. additional information can be get by open process and call ZwQueryInformationProcess with appropriate info class
if you want implement a performance monitoring tool, without "quantum effect" (when the measurement affects the state itself) you need use this ntdll api. for definitions look at http://processhacker.sourceforge.net/doc/ntexapi_8h_source.html
despite this is undocumented, existing functions and structures not changed how minimum from win2000 (so ~17 years) - new version of windows add a lot new info classes, some fields which was spare/unused in old version - can become used, but old(legacy) not changed
Dilemma, dilemma...
I've been working up a solution to a problem that uses async calls to the HttpClient library (GetAsync=>ConfigureAwait(false) etc). IIn a console app, my dll is very responsive and the mixture of using the async await calls and the Parallel.ForEach(=>) really makes me glow.
Now for the issue. After moving from this test harness to the target app, things have become problematic. I'm using asp.net mvc 4 and have hit a few issues. The main issue really is that calling my process on a controller action actually blocks the main thread until the async actions are complete. I've tried using an async controller pattern, I've tried using Task.Factory, I've tried using new Threads. You name it, I've tried all the flavours - and then some!.
Now, I appreciate that the nature of http is not designed to facilitate long processes like this and there are a number of articles here on SO that say don't do it. However, there are mitigating reasons why i NEED to use this approach. The main reason that I need to run this in mvc is due to the fact that I actually update the live data cache (on the mvc app) in realtime via raising an event in my dll's code. This means that fragments of the 50-60 data feeds can be pushed out live before the entire async action is complete. Therefore, client apps can receive partial updates within seconds of the async action being instigated. If I were to delegate the process out to a console app that ran the entire process in the background, I'd no longer be able to harness those fragment partial updates and this is the raison d'etre behind the entire choice of this architecture.
Can anyone shed light on a solution that would allow me to mitigate the blocking of the thread, whilst at the same time, allow each async fragment to be consumed by my object model and fed out to the client apps (I'm using signalr to make these client updates). A kind of nirvanna would be a scenario where an out-of-process cache object could be shared between numerous processes - the cache update could then be triggered and consumed by my mvc process (aka - http://devproconnections.com/aspnet-mvc/out-process-caching-aspnet). And so back to reality...
I have also considered using a secondary webservice to achieve this, but would welcome other options before once again over engineering my solution (there are already many moving parts and a multitude of async Actions going on).
Sorry not to have added any code, I'm hoping for practical philosophy/insights, rather than code help on this, tho would of course welcome coded examples that illustrate a solution to my problem.
I'll update the question as we move in time, as my thinking process is still maturing on this.
[edit] - for the sake of clarity, the snippet below is my brothers grimm code collision (extracted from a larger body of work):
Parallel.ForEach(scrapeDataBases, new ParallelOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount * 15
},
async dataBase =>
{
await dataBase.ScrapeUrlAsync().ConfigureAwait(false);
await UpdateData(dataType, (DataCheckerScrape)dataBase);
});
async and Parallel.ForEach do not mix naturally, so I'm not sure what your console solution looks like. Furthermore, Parallel should almost never be used on ASP.NET at all.
It sounds like what you would want is to just use Task.WhenAll.
On a side note, I think your reasoning around background processing on ASP.NET is incorrect. It is perfectly possible to have a separate process that updates the clients via SignalR.
Being that your question is pretty high level without a lot of code. You could try Reactive Extensions.
Something like
private IEnumerable<Task<Scraper>> ScrappedUrls()
{
// Return the 50 to 60 task for each website here.
// I assume they all return the same type.
// return .ScrapeUrlAsync().ConfigureAwait(false);
throw new NotImplementedException();
}
public async Task<IEnumerable<ScrapeOdds>> GetOdds()
{
var results = new Collection<ScrapeOdds>();
var urlRequest = ScrappedUrls();
var observerableUrls = urlRequest.Select(u => u.ToObservable()).Merge();
var publisher = observerableUrls.Publish();
var hubContext = GlobalHost.ConnectionManager.GetHubContext<OddsHub>();
publisher.Subscribe(scraper =>
{
// Whatever you do do convert to the result set
var scrapedOdds = scraper.GetOdds();
results.Add(scrapedOdds);
// update anything else you want when it arrives.
// Update SingalR here
hubContext.Clients.All.UpdatedOdds(scrapedOdds);
});
// Will fire off subscriptions and not continue until they are done.
await publisher;
return results;
}
The merge option will process the results as they come in. You can then update the signalR hubs plus whatever else you need to update as they come in. The controller action will have to wait for them all to come in. That's why there is an await on the publisher.
I don't really know if httpClient is going to like to have 50 - 60 web calls all at once or not. If it doesn't you can just take the IEnumerable to an array and break it down into a smaller chunks. And also there should be some error checking in there. With Rx you can also tell it to SubscribeOn and ObserverOn different threads but I think with everything being pretty much async that wouldn't be necessary.
I am currently working on a Windows 8 app which needs to store some tables. Currently, I am using XML files with XDocument classes to solve the purpose. It employs save and load methods using GetFileAsync and CreateFileAsync etc. Moreover, there save and load methods are called by different events. However, whenever there are repeated calls, an exception is thrown telling me that file access is denied. Expected behavior - more details here! While there are dirty methods to avoid this (like using locks and such) I am not very happy with the results. I'd rather prefer databases. Moreover, I am planning to write another app for Windows Phone 8 (and possibly a web version) which will make use of the data.
They have been repeatedly saying that Windows 8 is cloud based. Now the question: What is correct way to store my data? XML seems right but is has problems I mentioned above. What would be ideal cloud based solution involving Windows 8, Windows Phone 8 and possibly Azure? All I want is to store tables and make those accessible.
Sorry if the question seems unclear. I will provide information if required.
If you want to use Azure, the easiest way to proceed is Windows Azure Mobile services. It allows you to setup your database and webservices using a web interface in a few minutes.
It's quite cool, allows you to add custom javascript to your web api logic, and generates json web apis. There are client Libraries for Windows 8, Windows Phone and iOS. You could easily roll your own for any http enabled frontends.
However be aware that taking the cloud route means that your app won't work offline, (if you don't code a cache system that is. And a cache will requires a local DB.)
About the local DB
You really have to possibilities:
1) A real DB in your app, like SQLite. It's available as a Nuget package but right now ARM support isn't available out of the box, nor guaranteed by the team. If you don't need arm, Go try it :)
2) plain old file storage, like you did before. I personally often do that myself. You will however get issues when accessing it from different threads (Access Denied errors).
When you store things in a local file, don't forget to lock the critical sections (ie when you read or write to the file) to prevent the access denied exceptions. To be sure, Incapsulate your write/read logic in a service class instance unique within your app. (Use the singleton pattern for instance, or anything equivalent).
The lock itself, now. I imagine that you are using async await. I like this sweet thing too. But classic C# locks (using the lock keyword for instance) don't work with async await. (And even if it worked, blocking wouldn't be cool).
That's why the marvellous AsyncLock comes into play. It's a lock, but which -approximately- doesn't block (you await it).
public class AsyncLock
{
private readonly AsyncSemaphore m_semaphore;
private readonly Task<Releaser> m_releaser;
public AsyncLock()
{
m_semaphore = new AsyncSemaphore(1);
m_releaser = Task.FromResult(new Releaser(this));
}
public Task<Releaser> LockAsync()
{
var wait = m_semaphore.WaitAsync();
return wait.IsCompleted ?
m_releaser :
wait.ContinueWith((_, state) => new Releaser((AsyncLock)state),
this, CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}
public struct Releaser : IDisposable
{
private readonly AsyncLock m_toRelease;
internal Releaser(AsyncLock toRelease) { m_toRelease = toRelease; }
public void Dispose()
{
if (m_toRelease != null)
m_toRelease.m_semaphore.Release();
}
}
}
public class AsyncSemaphore
{
private readonly static Task s_completed = Task.FromResult(true);
private readonly Queue<TaskCompletionSource<bool>> m_waiters = new Queue<TaskCompletionSource<bool>>();
private int m_currentCount;
public AsyncSemaphore(int initialCount)
{
if (initialCount < 0) throw new ArgumentOutOfRangeException("initialCount");
m_currentCount = initialCount;
}
public Task WaitAsync()
{
lock (m_waiters)
{
if (m_currentCount > 0)
{
--m_currentCount;
return s_completed;
}
else
{
var waiter = new TaskCompletionSource<bool>();
m_waiters.Enqueue(waiter);
return waiter.Task;
}
}
}
public void Release()
{
TaskCompletionSource<bool> toRelease = null;
lock (m_waiters)
{
if (m_waiters.Count > 0)
toRelease = m_waiters.Dequeue();
else
++m_currentCount;
}
if (toRelease != null)
toRelease.SetResult(true);
}
}
you can use it this way (I suppose that you have an AsyncLock field named blogLock (taken from one of my own projects):
using (await blogLock.LockAsync())
{
using (var stream = await folder.OpenStreamForReadAsync(_blogFileName))
{
using (var reader = new StreamReader(stream))
{
var json = await reader.ReadToEndAsync();
var blog = await JsonConvert.DeserializeObjectAsync<Blog>(json);
return blog;
}
}
}
I've stumbled across this thread because I have basically the exact same problem. What seems staggering to me is that Microsoft makes its own enterprise-class database product (SQL Server), which already has a couple of lightweight, embeddable versions, and yet these seemingly can't be used with Windows 8/Windows Phone 8 applications to provide a local database. And yet MySQL can!
I've tried a couple of times to dabble in writing Windows Phone 8 apps, using my ASP.NET/VB/NET/SQL experience, but I always get bogged down in trying to learn a different way to perform data operations that I can do in my sleep in a web environment and lose interest. Why can't they make it easy to use SQL with W8/WP8 apps?
If the data pertains to the user of the device look at using SQLlite ... there is a question on Stack about SQLlite and local winRT Databases here: Local database storage for WinRT/Metro applications
SQL Databases
IndexedDB incase of the Windows 8 and JavaScript development
I know this is an old question that already has an accepted answer, but I'm going to get out my soapbox and answer it anyway because I think that rather than solve the technical problem it is better to use an architecture that doesn't depend on local database facilities.
In my experience very little data requires device local database services.
Most user generated data requiring local storage is non-roaming (ie device specific) user preferences and configuration (eg use removable storage setting). Game results fall into this category. Apps that produce larger quantities of user data are typically implemented on the desktop and almost certainly have a fast reliable connection to the local network, making server-based storage eminently suitable even for "fat" data like Office documents.
Reference data should certainly be server based, but you might choose to cache it. Nokia Maps on Windows Phone 8 is an excellent example of cached server-based data. The cache can even be explicitly pre-loaded in anticipation of off-line use.
The world view I have just expounded has little use for a local SQL Server. If you want a query engine, use LINQ. Express your application settings and user data as an object graph and (de)serialise XML. You could even use Linq2Xml directly on the XML if you don't want to maintain ORM classes.
Data of any sort that ought to be available across all the user's devices really needs to be cloud stored anyway.
To address some of akshay's comments,
Map data
Geospatial data is typically organised into structures known as quad-trees for a variety of reasons broadly to do with providing a level of detail that varies with zoom. The way these are accessed and manipulated derives considerable advantage from their representation as object graphs, and they are not updated by the users, so while this data certainly could be stored in a relational database and it probably is while it's being compiled, it certainly isn't stored or delivered that way.
LINQ is well adapted to this scenario because it can be applied directly to the quad-tree.
The data certainly is in a file. But I imagine you meant direct file access rather than indirection through another process. Probably the thought in your mind is that it is a good idea to invest significant effort on thoroughly solving the problems of concurrency and query processing once and share the solution between client apps. But this is a very heavyweight solution, and the query processing aspect is already well handled by LINQ (which is why I keep mentioning it).
Your XML problems
Read-only doesn't need to lock, so avoid the file system locking problem by caching and using Singleton pattern...
public static class XManager
{
static Dictionary<string, XDocument> __cache = new Dictionary<string, XDocument>();
public static XDocument GetXDoc(string filepath)
{
if (!__cache.Contains(filepath)
{
__cache[filepath] = new XDocument();
__cache[filepath].Load(filepath);
}
return _cache[filepath];
}
}
I have spent a whole day trying various ways using 'AddOnPreRenderCompleteAsync' and 'RegisterAsyncTask' but no success so far.
I succeeded making the call to the DB asynchronous using 'BeginExecuteReader' and 'EndExecuteReader' but that is missing the point. The asynch handling should not be the call to the DB which in my case is fast, it should be afterwards, during the 'while' loop, while calling an external web-service.
I think the simplified pseudo code will explain best:
(Note: the connection string is using 'MultipleActiveResultSets')
private void MyFunction()
{
"Select ID, UserName from MyTable"
// Open connection to DB
ExecuteReader();
if (DR.HasRows)
{
while (DR.Read())
{
// Call external web-service
// and get current Temperature of each UserName - DR["UserName"].ToString()
// Update my local DB
Update MyTable set Temperature = ValueFromWebService where UserName =
DR["UserName"];
CmdUpdate.ExecuteNonQuery();
}
// Close connection etc
}
}
Accessing the DB is fast. Getting the returned result from the external web-service is slow and that at least should be handled Asynchnously.
If each call to the web service takes just 1 second, assuming I have only 100 users it will take minimum 100 seconds for the DB update to complete, which obviously is not an option.
There eventually should be thousands of users (currently only 2).
Currently everything works, just very synchronously :)
Thoughts to myself:
Maybe my way of approaching this is wrong?
Maybe the entire process should be called Asynchnously?
Many thanx
Have you considered spinning this whole thing off into it's own thread?
What is really your concern ?
Avoid the long task blocking your application ?
If so, you can use a thread (see BackgroundWorker)
Process several call to the web service in parallel to speed up the whole think ?
If so, maybe the web service can be called asynchronously providing a callback. You could also use a ThreadPool or Tasks. But you'll have to manage to wait for all your calls or tasks to complete before proceeding to the DB update.
You should keep the database connection open for as short of a time as possible. Therefore, don't do stuff while iterating through a DataReader. Most application developers prefer to put their actual database access code on a separate layer, and in a case like this, you would return a DataTable or a typed collection to the calling code. Furthermore, if you are updating the same table you are reading from, this could result in locks.
How many users will be executing this method at once, and how often does it need to be refreshed? Are you sure you need to do this from inside the web app? You may consider using a singleton for this, in which case spinning off a couple worker threads is totally appropriate even if it's in the web app. Another thing to consider is using a Windows Service, which I think would be more appropriate for periodically updating data via from a web service that doesn't even have to do with the current user's session.
Id say, Create a thread for each webrequest, and do something like this:
extra functions:
int privCompleteThreads = 0;
int OpenThreads = 0;
int CompleteThreads
{
get{ return privCompleteThreads; }
set{ privCompleteThreads = value; CheckDoneOperations(); }
}
void CheckDoneOperations
{
if(CompleteThreads == OpenThreads)
{
//done!
}
}
in main program:
foreach(time i need to open a request)
{
OpenThreads = OpenThreads + 1;
//Create thread here
}
inside the threaded function:
//do your other stuff here
//do this when done the operation:
CompleteThreads = CompleteThreads + 1;
now im not sure how reliable this approach would be, its up to you. but a normal web request shouldnt take a second, your browser doesnt take a second loading this page does it? mine loads it as fast as i can hit F5. Its just opening a stream, you could try opening the web request once, and just using the same instance over and over aswell, and see if that speeds it up at all