My code below starts out great and seems to run pretty quickly but then I will start getting this error message after awhile. I realize that Entity Framework/Dbcontext is not thread safe and this is probably causing the issue so is there a way to change this code so that it keeps the same performance and doesn't have issues with not closing threads which is probably causing the problem or is there another way to speed up this process? I have over 9000 symbols to download and insert into a database and I tried doing basic for loops with the await command but it was extremely slow and took more than 10 times longer to achieve the same results.
public static async Task startInitialMarketSymbolsDownload(string market)
{
try
{
List<string> symbolList = new List<string>();
symbolList = getStockSymbols(market);
var historicalGroups = symbolList.Select((x, i) => new { x, i })
.GroupBy(x => x.i / 50)
.Select(g => g.Select(x => x.x).ToArray());
await Task.WhenAll(historicalGroups.Select(g => Task.Run(() => getLocalHistoricalStockData(g, market))));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
public static async Task getLocalHistoricalStockData(string[] symbols, string market)
{
// download data for list of symbols and then upload to db tables
string symbolInfo = null;
try
{
using (financeEntities context = new financeEntities())
{
foreach (string symbol in symbols)
{
symbolInfo = symbol;
List<HistoryPrice> hList = Get(symbol, new DateTime(1900, 1, 1), DateTime.UtcNow);
var backDates = context.DailyAmexDatas.Where(c => c.Symbol == symbol).Select(d => d.Date).ToList();
List<HistoryPrice> newHList = hList.Where(c => backDates.Contains(c.Date) == false).ToList<HistoryPrice>();
foreach (HistoryPrice h in newHList)
{
DailyAmexData amexData = new DailyAmexData();
// set the data then add it
amexData.Symbol = symbol;
amexData.Open = h.Open;
amexData.High = h.High;
amexData.Low = h.Low;
amexData.Close = h.Close;
amexData.Volume = h.Volume;
amexData.AdjustedClose = h.AdjClose;
amexData.Date = h.Date;
context.DailyAmexDatas.Add(amexData);
}
// now save everything
await context.SaveChangesAsync();
Console.WriteLine(symbol + " added to the " + market + " database!");
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.InnerException.Message);
}
}
Related
I want to improve the performance and remove the delay in showing the data to the user on the screen. As per requirement, I need to get the list of the data from a different source, then get the further data from other sources based on the previous data which takes a lot of time and feel that executing them sequentially.
I am looking for the suggestion to improve the performance, asynchronously call the client and wait at the end and reduce the wait time of the request.
foreach (var n in player.data)
{
var request1 = new HttpRequestMessage(HttpMethod.Get, "https://api.*****.com/buckets/" + **** + "/tests/" + n.id);
var client1 = new HttpClient();
request1.Headers.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", "****-b23a-*****-b1be-********");
HttpResponseMessage response1 = await client1.SendAsync(request1, HttpCompletionOption.ResponseHeadersRead);
List<dataroot> root1 = new List<dataroot>();
if (response1.StatusCode == System.Net.HttpStatusCode.OK)
{
try
{
var apiString1 = await response1.Content.ReadAsStringAsync();
var player1 = Newtonsoft.Json.JsonConvert.DeserializeObject<envRoot>(apiString1);
if (!string.IsNullOrEmpty(player1.data.environments[0].parent_environment_id))
{
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.isShared = true);
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.sharedEnvironmentId = player1.data.environments[0].parent_environment_id);
//player.data.Where(x=>x.id==player1.data.environments[0].test_id).ToList().ForEach(s=>s.sharedEnvironmentId=player1.data.environments[0].test_id);
}
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.normalenvironmentId = player1.data.environments[0].id);
}
catch (Exception ex)
{
var test = ex;
}
}
}
You can try the way I did in my sample below:
https://github.com/rajabb/RunningLongRunningTasksEfficientlyAndWaitAtEnd
The main part of code is:
List<Task> tasks = new List<Task>();
for (int i = 0; i < 100; i++)
{
tasks.Add(LongRunningTask.RunAsync(i.ToString()));
}
await Task.WhenAll(tasks.ToArray());
I have an SSIS package that's launching another SSIS package in a Foreach container; because the container reports completion as soon as it launched all the packages it had to launch, I need a way to make it wait until all "child" packages have completed.
So I implemented a little sleep-wait loop that basically pulls the Execution objects off the SSISDB for the ID's I'm interested in.
The problem I'm facing, is that a grand total of 0 Dts.Events.FireProgress events get fired, and if I uncomment the Dts.Events.FireInformation call in the do loop, then every second I get a message reported saying 23 packages are still running... except if I check in SSISDB's Active Operations window I see that most have completed already and 3 or 4 are actually running.
What am I doing wrong, why wouldn't runningCount contain the number of actually running executions?
using ssis = Microsoft.SqlServer.Management.IntegrationServices;
public void Main()
{
const string serverName = "REDACTED";
const string catalogName = "SSISDB";
var ssisConnectionString = $"Data Source={serverName};Initial Catalog=msdb;Integrated Security=SSPI;";
var ids = GetExecutionIDs(serverName);
var idCount = ids.Count();
var previousCount = -1;
var iterations = 0;
try
{
var fireAgain = true;
const int secondsToSleep = 1;
var sleepTime = TimeSpan.FromSeconds(secondsToSleep);
var maxIterations = TimeSpan.FromHours(1).TotalSeconds / sleepTime.TotalSeconds;
IDictionary<long, ssis.Operation.ServerOperationStatus> catalogExecutions;
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
do
{
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
var runningCount = catalogExecutions.Count(kvp => kvp.Value == ssis.Operation.ServerOperationStatus.Running);
System.Threading.Thread.Sleep(sleepTime);
//Dts.Events.FireInformation(0, "ScriptMain", $"{runningCount} packages still running.", string.Empty, 0, ref fireAgain);
if (runningCount != previousCount)
{
previousCount = runningCount;
decimal completed = idCount - runningCount;
decimal percentCompleted = completed / idCount;
Dts.Events.FireProgress($"Waiting... {completed}/{idCount} completed", Convert.ToInt32(100 * percentCompleted), 0, 0, "", ref fireAgain);
}
iterations++;
if (iterations >= maxIterations)
{
Dts.Events.FireWarning(0, "ScriptMain", $"Timeout expired, requesting cancellation.", string.Empty, 0);
Dts.Events.FireQueryCancel();
Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Canceled;
return;
}
}
while (catalogExecutions.Any(kvp => kvp.Value == ssis.Operation.ServerOperationStatus.Running));
}
}
catch (Exception exception)
{
if (exception.InnerException != null)
{
Dts.Events.FireError(0, "ScriptMain", exception.InnerException.ToString(), string.Empty, 0);
}
Dts.Events.FireError(0, "ScriptMain", exception.ToString(), string.Empty, 0);
Dts.Log(exception.ToString(), 0, new byte[0]);
Dts.TaskResult = (int)ScriptResults.Failure;
return;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
The GetExecutionIDs function simply returns all execution ID's for the child packages, from my metadata database.
The problem is that you're re-using the same connection at every iteration. Turn this:
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
do
{
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
Into this:
do
{
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
}
And you'll get correct execution status every time. Not sure why the connection can't be reused, but keeping connections as short-lived as possible is always a good idea - and that's another proof.
Dears
Please help me with restoring delayed (and persisted) workflows.
I'm trying to check on self-hosted workflow store is there any instance that was delayed and can be resumed. For testing purposes I've created dummy activity that is delayed and it persists on delay.
generally resume process looks like:
get WF definition
configure sql instance store
call WaitForEvents
is there event with HasRunnableWorkflowEvent.Value name and if it is
create WorkflowApplication object and execute LoadRunnableInstance method
it works fine if store is created|initialized, WaitForEvents is called, store is closed. In such case store reads all available workflows from persisted DB and throws timeout exception if there is no workflows available to resume.
The problem happens if store is created and loop is started only for WaitForEvents (the same thing happens with BeginWaitForEvents). In such case it reads all available workflows from DB (with proper IDs) but then instead of timeout exception it is going to read one more instance (I know exactly how many workflows is there ready to be resumed because using separate test database). But fails to read and throws InstanceNotReadyException. In catch I'm checking workflowApplication.Id, but it was not saved with my test before.
I've tried to run on new (empty) persistent database and result is the same :(
This code fails:
using (var storeWrapper = new StoreWrapper(wf, connStr))
for (int q = 0; q < 5; q++)
{
var id = Resume(storeWrapper); // InstanceNotReadyException here when all activities is resumed
But this one works as expected:
for (int q = 0; q < 5; q++)
using (var storeWrapper = new StoreWrapper(wf, connStr))
{
var id = Resume(storeWrapper); // timeout exception here or beginWaitForEvents continues to wait
What is a best solution in such case? Add empty catch for InstanceNotReadyException and ignore it?
Here are my tests
const int delayTime = 15;
string connStr = "Server=db;Database=AppFabricDb_Test;Integrated Security=True;";
[TestMethod]
public void PersistOneOnIdleAndResume()
{
var wf = GetDelayActivity();
using (var storeWrapper = new StoreWrapper(wf, connStr))
{
var id = CreateAndRun(storeWrapper);
Trace.WriteLine(string.Format("done {0}", id));
}
using (var storeWrapper = new StoreWrapper(wf, connStr))
for (int q = 0; q < 5; q++)
{
var id = Resume(storeWrapper);
Trace.WriteLine(string.Format("resumed {0}", id));
}
}
Activity GetDelayActivity(string addName = "")
{
var name = new Variable<string>(string.Format("incr{0}", addName));
Activity wf = new Sequence
{
DisplayName = "testDelayActivity",
Variables = { name, new Variable<string>("CustomDataContext") },
Activities =
{
new WriteLine
{
Text = string.Format("before delay {0}", delayTime)
},
new Delay
{
Duration = new InArgument<TimeSpan>(new TimeSpan(0, 0, delayTime))
},
new WriteLine
{
Text = "after delay"
}
}
};
return wf;
}
Guid CreateAndRun(StoreWrapper sw)
{
var idleEvent = new AutoResetEvent(false);
var wfApp = sw.GetApplication();
wfApp.Idle = e => idleEvent.Set();
wfApp.Aborted = e => idleEvent.Set();
wfApp.Completed = e => idleEvent.Set();
wfApp.Run();
idleEvent.WaitOne(40 * 1000);
var res = wfApp.Id;
wfApp.Unload();
return res;
}
Guid Resume(StoreWrapper sw)
{
var res = Guid.Empty;
var events = sw.GetStore().WaitForEvents(sw.Handle, new TimeSpan(0, 0, delayTime));
if (events.Any(e => e.Equals(HasRunnableWorkflowEvent.Value)))
{
var idleEvent = new AutoResetEvent(false);
var obj = sw.GetApplication();
try
{
obj.LoadRunnableInstance(); //instancenotready here if the same store has read all instances from DB and no delayed left
obj.Idle = e => idleEvent.Set();
obj.Completed = e => idleEvent.Set();
obj.Run();
idleEvent.WaitOne(40 * 1000);
res = obj.Id;
obj.Unload();
}
catch (InstanceNotReadyException)
{
Trace.TraceError("failed to resume {0} {1} {2}", obj.Id
, obj.DefinitionIdentity == null ? null : obj.DefinitionIdentity.Name
, obj.DefinitionIdentity == null ? null : obj.DefinitionIdentity.Version);
foreach (var e in events)
{
Trace.TraceWarning("event {0}", e.Name);
}
throw;
}
}
return res;
}
Here is store wrapper definition I'm using for test:
public class StoreWrapper : IDisposable
{
Activity WfDefinition { get; set; }
public static readonly XName WorkflowHostTypePropertyName = XNamespace.Get("urn:schemas-microsoft-com:System.Activities/4.0/properties").GetName("WorkflowHostType");
public StoreWrapper(Activity wfDefinition, string connectionStr)
{
_store = new SqlWorkflowInstanceStore(connectionStr);
HostTypeName = XName.Get(wfDefinition.DisplayName, "ttt.workflow");
WfDefinition = wfDefinition;
}
SqlWorkflowInstanceStore _store;
public SqlWorkflowInstanceStore GetStore()
{
if (Handle == null)
{
InitStore(_store, WfDefinition);
Handle = _store.CreateInstanceHandle();
var view = _store.Execute(Handle, new CreateWorkflowOwnerCommand
{
InstanceOwnerMetadata = { { WorkflowHostTypePropertyName, new InstanceValue(HostTypeName) } }
}, TimeSpan.FromSeconds(30));
_store.DefaultInstanceOwner = view.InstanceOwner;
//Trace.WriteLine(string.Format("{0} owns {1}", view.InstanceOwner.InstanceOwnerId, HostTypeName));
}
return _store;
}
protected virtual void InitStore(SqlWorkflowInstanceStore store, Activity wfDefinition)
{
}
public InstanceHandle Handle { get; protected set; }
XName HostTypeName { get; set; }
public void Dispose()
{
if (Handle != null)
{
var deleteOwner = new DeleteWorkflowOwnerCommand();
//Trace.WriteLine(string.Format("{0} frees {1}", Store.DefaultInstanceOwner.InstanceOwnerId, HostTypeName));
_store.Execute(Handle, deleteOwner, TimeSpan.FromSeconds(30));
Handle.Free();
Handle = null;
_store = null;
}
}
public WorkflowApplication GetApplication()
{
var wfApp = new WorkflowApplication(WfDefinition);
wfApp.InstanceStore = GetStore();
wfApp.PersistableIdle = e => PersistableIdleAction.Persist;
Dictionary<XName, object> wfScope = new Dictionary<XName, object> { { WorkflowHostTypePropertyName, HostTypeName } };
wfApp.AddInitialInstanceValues(wfScope);
return wfApp;
}
}
I'm not workflow foundation expert so my answer is based on the official examples from Microsoft. The first one is WF4 host resumes delayed workflow (CSWF4LongRunningHost) and the second is Microsoft.Samples.AbsoluteDelay. In both samples you will find a code similar to yours i.e.:
try
{
wfApp.LoadRunnableInstance();
...
}
catch (InstanceNotReadyException)
{
//Some logging
}
Taking this into account the answer is that you are right and the empty catch for InstanceNotReadyException is a good solution.
I wrote a method to download data from the internet and save it to my database. I wrote this using PLINQ to take advantage of my multi-core processor and because it is downloading thousands of different files in a very short period of time. I have added comments below in my code to show where it stops but the program just sits there and after awhile, I get an out of memory exception. This being my first time using TPL and PLINQ, I'm extremely confused so I could really use some advice on what to do to fix this.
UPDATE: I found out that I was getting a webexception constantly because the webclient was timing out. I fixed this by increasing the max amount of connections according to this answer here. I was then getting exceptions for the connection not opening and I fixed it by using this answer here. I'm now getting connection timeout errors for the database even though it is a local sql server. I still haven't been able to get any of my code to run so I could totally use some advice
static void Main(string[] args)
{
try
{
while (true)
{
// start the download process for market info
startDownload();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void startDownload()
{
DateTime currentDay = DateTime.Now;
List<Task> taskList = new List<Task>();
if (Helper.holidays.Contains(currentDay) == false)
{
List<string> markets = new List<string>() { "amex", "nasdaq", "nyse", "global" };
Parallel.ForEach(markets, market =>
{
Downloads.startInitialMarketSymbolsDownload(market);
}
);
Console.WriteLine("All downloads finished!");
}
// wait 24 hours before you do this again
Task.Delay(TimeSpan.FromHours(24)).Wait();
}
public static void startInitialMarketSymbolsDownload(string market)
{
try
{
List<string> symbolList = new List<string>();
symbolList = Helper.getStockSymbols(market);
var historicalGroups = symbolList.AsParallel().Select((x, i) => new { x, i })
.GroupBy(x => x.i / 100)
.Select(g => g.Select(x => x.x).ToArray());
historicalGroups.AsParallel().ForAll(g => getHistoricalStockData(g, market));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void getHistoricalStockData(string[] symbols, string market)
{
// download data for list of symbols and then upload to db tables
Uri uri;
string url, line;
decimal open = 0, high = 0, low = 0, close = 0, adjClose = 0;
DateTime date;
Int64 volume = 0;
string[] lineArray;
List<string> symbolError = new List<string>();
Dictionary<string, string> badNameError = new Dictionary<string, string>();
Parallel.ForEach(symbols, symbol =>
{
url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol + "&a=00&b=1&c=1900&d=" + (DateTime.Now.Month - 1) + "&e=" + DateTime.Now.Day + "&f=" + DateTime.Now.Year + "&g=d&ignore=.csv";
uri = new Uri(url);
using (dbEntities entity = new dbEntities())
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(uri))
using (StreamReader reader = new StreamReader(stream))
{
while (reader.EndOfStream == false)
{
line = reader.ReadLine();
lineArray = line.Split(',');
// if it isn't the very first line
if (lineArray[0] != "Date")
{
// set the data for each array here
date = Helper.parseDateTime(lineArray[0]);
open = Helper.parseDecimal(lineArray[1]);
high = Helper.parseDecimal(lineArray[2]);
low = Helper.parseDecimal(lineArray[3]);
close = Helper.parseDecimal(lineArray[4]);
volume = Helper.parseInt(lineArray[5]);
adjClose = Helper.parseDecimal(lineArray[6]);
switch (market)
{
case "nasdaq":
DailyNasdaqData nasdaqData = new DailyNasdaqData();
var nasdaqQuery = from r in entity.DailyNasdaqDatas.AsParallel().AsEnumerable()
where r.Date == date
select new StockData { Close = r.AdjustedClose };
List<StockData> nasdaqResult = nasdaqQuery.AsParallel().ToList(); // hits this line
break;
default:
break;
}
}
}
// now save everything
entity.SaveChanges();
}
}
);
}
Async lambdas work like async methods in one regard: They do not complete synchronously but they return a Task. In your parallel loop you are simply generating tasks as fast as you can. Those tasks hold onto memory and other resources such as DB connections.
The simplest fix is probably to just use synchronous database commits. This will not result in a loss of throughput because the database cannot deal with high amounts of concurrent DML anyway.
The code below works as intended but I am not sure is the most elegant I would use.
using (DatabaseContext context = DatabaseContext.CreateContext(_incompleteConnString + prefix + campaignDBPlatform))
Progress prog = new Progress();
TaskFactory tf = new TaskFactory();
var parent = tf.StartNew(() =>
Parallel.ForEach(QuestionsLangConstants.questionLangs.Values, i =>
{
try
{
qrepo.UploadQuestions(QWorkBook.Worksheets[i.QSheet], i, prog);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
})
);
prog.Show();
var finalTask = parent.ContinueWith(i =>
{
using (DatabaseContext context2 = DatabaseContext.CreateContext(_incompleteConnString + prefix + campaignDBPlatform))
{
UploadedQuestionsRepliesRepository uqrepo = new
UploadedQuestionsRepliesRepository(context2);
UploadedQuestionsReplies UQuestions = new UploadedQuestionsReplies() {
Id = (int)uqrepo.getNextSeqValue(),
FileName = "test",
RQType = Questions.QuestionsType.ToString(),
TimeStamp = DateTime.Now
};
uqrepo.Insert(UQuestions);
uqrepo.Save();
}
});
}
If I do not add context2 context gets disposed while I am when parent continues. If I use a finalTask.Wait() however the UI freezes. Is there a better solution to what I have above?
You could avoid the first using. Use the context in both cases and if you are sure that the second task will always be executed, dispose it at the end, otherwise add some extra code to make sure that your context is always disposed.
DatabaseContext context2 = DatabaseContext.CreateContext(_incompleteConnString + prefix + campaignDBPlatform)
//Initialize qrepo with the context here??
Progress prog = new Progress();
TaskFactory tf = new TaskFactory();
var parent = tf.StartNew(() =>
Parallel.ForEach(QuestionsLangConstants.questionLangs.Values, i =>
{
try
{
qrepo.UploadQuestions(QWorkBook.Worksheets[i.QSheet], i, prog);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
// Do you really want to continue with the next task after the exception?
}
})
);
prog.Show();
var finalTask = parent.ContinueWith(i =>
{
UploadedQuestionsRepliesRepository uqrepo = new UploadedQuestionsRepliesRepository(context2);
UploadedQuestionsReplies UQuestions = new UploadedQuestionsReplies() {
Id = (int)uqrepo.getNextSeqValue(),
FileName = "test",
RQType = Questions.QuestionsType.ToString(),
TimeStamp = DateTime.Now
};
uqrepo.Insert(UQuestions);
uqrepo.Save();
context2.Dispose();
});
}