I am using FFMPEG from my C# application to build out the video stream from raw unencoded frames. For just one input stream this is fairly straightforward:
var argumentBuilder = new List<string>();
argumentBuilder.Add("-loglevel panic");
argumentBuilder.Add("-f h264");
argumentBuilder.Add("-i pipe:");
argumentBuilder.Add("-c:v libx264");
argumentBuilder.Add("-bf 0");
argumentBuilder.Add("-pix_fmt yuv420p");
argumentBuilder.Add("-an");
argumentBuilder.Add(filename);
startInfo.Arguments = string.Join(" ", argumentBuilder.ToArray());
var _ffMpegProcess = new Process();
_ffMpegProcess.EnableRaisingEvents = true;
_ffMpegProcess.OutputDataReceived += (s, e) => { Debug.WriteLine(e.Data); };
_ffMpegProcess.ErrorDataReceived += (s, e) => { Debug.WriteLine(e.Data); };
_ffMpegProcess.StartInfo = startInfo;
Console.WriteLine($"[log] Starting write to {filename}...");
_ffMpegProcess.Start();
_ffMpegProcess.BeginOutputReadLine();
_ffMpegProcess.BeginErrorReadLine();
for (int i = 0; i < videoBuffer.Count; i++)
{
_ffMpegProcess.StandardInput.BaseStream.Write(videoBuffer[i], 0, videoBuffer[i].Length);
}
_ffMpegProcess.StandardInput.BaseStream.Close();
One of the challenges that I am trying to address is writing to two input pipes, similar to how I could do that from, say, Node.js, by referring to pipe:4 or pipe:5. It seems that I can only write to standard input directly but not split it into "channels".
What's the approach to do this in C#?
Based on what is written here and on a good night of sleep (where I dreamt that I could use Stream.CopyAsync), this is the skeleton of the solution:
string pathToFFmpeg = #"C:\ffmpeg\bin\ffmpeg.exe";
string[] inputs = new[] { "video.m4v", "audio.mp3" };
string output = "output2.mp4";
var npsss = new NamedPipeServerStream[inputs.Length];
var fss = new FileStream[inputs.Length];
try
{
for (int i = 0; i < fss.Length; i++)
{
fss[i] = File.OpenRead(inputs[i]);
}
// We use Guid for pipeNames
var pipeNames = Array.ConvertAll(inputs, x => Guid.NewGuid().ToString("N"));
for (int i = 0; i < npsss.Length; i++)
{
npsss[i] = new NamedPipeServerStream(pipeNames[i], PipeDirection.Out, 1, PipeTransmissionMode.Byte, PipeOptions.Asynchronous);
}
string pipeNamesFFmpeg = string.Join(" ", pipeNames.Select(x => $#"-i \\.\pipe\{x}"));
using (var proc = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = pathToFFmpeg,
Arguments = $#"-loglevel debug -y {pipeNamesFFmpeg} -c:v copy -c:a copy ""{output}""",
UseShellExecute = false,
}
})
{
Console.WriteLine($"FFMpeg path: {pathToFFmpeg}");
Console.WriteLine($"Arguments: {proc.StartInfo.Arguments}");
proc.EnableRaisingEvents = false;
proc.Start();
var tasks = new Task[npsss.Length];
for (int i = 0; i < npsss.Length; i++)
{
var pipe = npsss[i];
var fs = fss[i];
pipe.WaitForConnection();
tasks[i] = fs.CopyToAsync(pipe)
// .ContinueWith(_ => pipe.FlushAsync()) // Flush does nothing on Pipes
.ContinueWith(x => {
pipe.WaitForPipeDrain();
pipe.Disconnect();
});
}
Task.WaitAll(tasks);
proc.WaitForExit();
}
}
finally
{
foreach (var fs in fss)
{
fs?.Dispose();
}
foreach (var npss in npsss)
{
npss?.Dispose();
}
}
There are various attention points:
Not all formats are compatible with pipes. For example many .mp4 aren't, because they have their moov atom towards the end of the file, but ffmpeg needs it immediately, and pipes aren't searchable (ffmpeg can't go to the end of the pipe, read the moov atom and then go to the beginning of the pipe). See here for example
I receive an error at the end of the streaming. The file seems to be correct. I don't know why. Some other persons signaled it but I haven't seen any explanation
\.\pipe\55afc0c8e95f4a4c9cec5ae492bc518a: Invalid argument
\.\pipe\92205c79c26a410aa46b9b35eb3bbff6: Invalid argument
I don't normally use Task and Async, so I'm not 100% sure if what I wrote is correct. This code doesn't work for example:
tasks[i] = pipe.WaitForConnectionAsync().ContinueWith(x => fs.CopyToAsync(pipe, 4096)).ContinueWith(...);
Mmmmh perhaps the last can be solved:
tasks[i] = ConnectAndCopyToPipe(fs, pipe);
and then
public static async Task ConnectAndCopyToPipe(FileStream fs, NamedPipeServerStream pipe)
{
await pipe.WaitForConnectionAsync();
await fs.CopyToAsync(pipe);
// await fs.FlushAsync(); // Does nothing
pipe.WaitForPipeDrain();
pipe.Disconnect();
}
Related
I am doing some performance tests on ZeroMQ in order to compare it with others like RabbitMQ and ActiveMQ.
In my broadcast tests and to avoid "The Dynamic Discovery Problem" as referred by ZeroMQ documentation I have used a proxy. In my scenario, I am using 50 concurrent publishers each one sending 500 messages with 1ms delay between sends. Each message is then read by 50 subscribers. And as I said I am losing messages, each of the subscribers should receive a total of 25000 messages and they are each receiving between 5000 and 10000 messages only.
I am using Windows and C# .Net client clrzmq4 (4.1.0.31).
I have already tried some solutions that I found on other posts:
I have set linger to TimeSpan.MaxValue
I have set ReceiveHighWatermark to 0 (as it is presented as infinite, but I have tried also Int32.MaxValue)
I have set checked for slow start receivers, I made receivers start some seconds before publishers
I had to make sure that no garbage collection is made to the socket instances (linger should do it but to make sure)
I have a similar scenario (with similar logic) using NetMQ and it works fine. The other scenario does not use security though and this one does (and that's also the reason why I use clrzmq in this one because I need client authentication with certificates that is not yet possible on NetMQ).
EDIT:
public class MCVEPublisher
{
public void publish(int numberOfMessages)
{
string topic = "TopicA";
ZContext ZContext = ZContext.Create();
ZSocket publisher = new ZSocket(ZContext, ZSocketType.PUB);
//Security
// Create or load certificates
ZCert serverCert = Main.GetOrCreateCert("publisher");
var actor = new ZActor(ZContext, ZAuth.Action, null);
actor.Start();
// send CURVE settings to ZAuth
actor.Frontend.Send(new ZFrame("VERBOSE"));
actor.Frontend.Send(new ZMessage(new List<ZFrame>()
{ new ZFrame("ALLOW"), new ZFrame("127.0.0.1") }));
actor.Frontend.Send(new ZMessage(new List<ZFrame>()
{ new ZFrame("CURVE"), new ZFrame(".curve") }));
publisher.CurvePublicKey = serverCert.PublicKey;
publisher.CurveSecretKey = serverCert.SecretKey;
publisher.CurveServer = true;
publisher.Linger = TimeSpan.MaxValue;
publisher.ReceiveHighWatermark = Int32.MaxValue;
publisher.Connect("tcp://127.0.0.1:5678");
Thread.Sleep(3500);
for (int i = 0; i < numberOfMessages; i++)
{
Thread.Sleep(1);
var update = $"{topic} {"message"}";
using (var updateFrame = new ZFrame(update))
{
publisher.Send(updateFrame);
}
}
//just to make sure it does not end instantly
Thread.Sleep(60000);
//just to make sure publisher is not garbage collected
ulong Affinity = publisher.Affinity;
}
}
public class MCVESubscriber
{
private ZSocket subscriber;
private List<string> prints = new List<string>();
public void read()
{
string topic = "TopicA";
var context = new ZContext();
subscriber = new ZSocket(context, ZSocketType.SUB);
//Security
ZCert serverCert = Main.GetOrCreateCert("xpub");
ZCert clientCert = Main.GetOrCreateCert("subscriber");
subscriber.CurvePublicKey = clientCert.PublicKey;
subscriber.CurveSecretKey = clientCert.SecretKey;
subscriber.CurveServer = true;
subscriber.CurveServerKey = serverCert.PublicKey;
subscriber.Linger = TimeSpan.MaxValue;
subscriber.ReceiveHighWatermark = Int32.MaxValue;
// Connect
subscriber.Connect("tcp://127.0.0.1:1234");
subscriber.Subscribe(topic);
while (true)
{
using (var replyFrame = subscriber.ReceiveFrame())
{
string messageReceived = replyFrame.ReadString();
messageReceived = Convert.ToString(messageReceived.Split(' ')[1]);
prints.Add(messageReceived);
}
}
}
public void PrintMessages()
{
Console.WriteLine("printing " + prints.Count);
}
}
public class Main
{
static void Main(string[] args)
{
broadcast(500, 50, 50, 30000);
}
public static void broadcast(int numberOfMessages, int numberOfPublishers, int numberOfSubscribers, int timeOfRun)
{
new Thread(() =>
{
using (var context = new ZContext())
using (var xsubSocket = new ZSocket(context, ZSocketType.XSUB))
using (var xpubSocket = new ZSocket(context, ZSocketType.XPUB))
{
//Security
ZCert serverCert = GetOrCreateCert("publisher");
ZCert clientCert = GetOrCreateCert("xsub");
xsubSocket.CurvePublicKey = clientCert.PublicKey;
xsubSocket.CurveSecretKey = clientCert.SecretKey;
xsubSocket.CurveServer = true;
xsubSocket.CurveServerKey = serverCert.PublicKey;
xsubSocket.Linger = TimeSpan.MaxValue;
xsubSocket.ReceiveHighWatermark = Int32.MaxValue;
xsubSocket.Bind("tcp://*:5678");
//Security
serverCert = GetOrCreateCert("xpub");
var actor = new ZActor(ZAuth.Action0, null);
actor.Start();
// send CURVE settings to ZAuth
actor.Frontend.Send(new ZFrame("VERBOSE"));
actor.Frontend.Send(new ZMessage(new List<ZFrame>()
{ new ZFrame("ALLOW"), new ZFrame("127.0.0.1") }));
actor.Frontend.Send(new ZMessage(new List<ZFrame>()
{ new ZFrame("CURVE"), new ZFrame(".curve") }));
xpubSocket.CurvePublicKey = serverCert.PublicKey;
xpubSocket.CurveSecretKey = serverCert.SecretKey;
xpubSocket.CurveServer = true;
xpubSocket.Linger = TimeSpan.MaxValue;
xpubSocket.ReceiveHighWatermark = Int32.MaxValue;
xpubSocket.Bind("tcp://*:1234");
using (var subscription = ZFrame.Create(1))
{
subscription.Write(new byte[] { 0x1 }, 0, 1);
xpubSocket.Send(subscription);
}
Console.WriteLine("Intermediary started, and waiting for messages");
// proxy messages between frontend / backend
ZContext.Proxy(xsubSocket, xpubSocket);
Console.WriteLine("end of proxy");
//just to make sure it does not end instantly
Thread.Sleep(60000);
//just to make sure xpubSocket and xsubSocket are not garbage collected
ulong Affinity = xpubSocket.Affinity;
int ReceiveHighWatermark = xsubSocket.ReceiveHighWatermark;
}
}).Start();
Thread.Sleep(5000); //to make sure proxy started
List<MCVESubscriber> Subscribers = new List<MCVESubscriber>();
for (int i = 0; i < numberOfSubscribers; i++)
{
MCVESubscriber ZeroMqSubscriber = new MCVESubscriber();
new Thread(() =>
{
ZeroMqSubscriber.read();
}).Start();
Subscribers.Add(ZeroMqSubscriber);
}
Thread.Sleep(10000);//to make sure all subscribers started
for (int i = 0; i < numberOfPublishers; i++)
{
MCVEPublisher ZeroMqPublisherBroadcast = new MCVEPublisher();
new Thread(() =>
{
ZeroMqPublisherBroadcast.publish(numberOfMessages);
}).Start();
}
Thread.Sleep(timeOfRun);
foreach (MCVESubscriber Subscriber in Subscribers)
{
Subscriber.PrintMessages();
}
}
public static ZCert GetOrCreateCert(string name, string curvpath = ".curve")
{
ZCert cert;
string keyfile = Path.Combine(curvpath, name + ".pub");
if (!File.Exists(keyfile))
{
cert = new ZCert();
Directory.CreateDirectory(curvpath);
cert.SetMeta("name", name);
cert.Save(keyfile);
}
else
{
cert = ZCert.Load(keyfile);
}
return cert;
}
}
This code also produces the expected number of messages when security is disabled, but when turned on it doesn't.
Does someone know another thing to check? Or has it happened to anyone before?
Thanks
I need to process a very large text file (6-8 GB). I wrote the code attached below. Unfortunately, every time output file reaches (being created next to source file) reaches ~2GB, I observe sudden jump in memory consumption (~100MB to few GBs) and in result - out of memory exception.
Debugger indicates that OOM occurs at while ((tempLine = streamReader.ReadLine()) != null)
I am targeting .NET 4.7 and x64 architecture only.
Single line is at most 50 character long.
I can workaround this and split original file to smaller parts not to face the problem while processing and merge resuls back to one file at the end, but would like not to do it.
Code:
public async Task PerformDecodeAsync(string sourcePath, string targetPath)
{
var allLines = CountLines(sourcePath);
long processedlines = default;
using (File.Create(targetPath));
var streamWriter = File.AppendText(targetPath);
var decoderBlockingCollection = new BlockingCollection<string>(1000);
var writerBlockingCollection = new BlockingCollection<string>(1000);
var producer = Task.Factory.StartNew(() =>
{
using (var streamReader = new StreamReader(File.OpenRead(sourcePath), Encoding.Default, true))
{
string tempLine;
while ((tempLine = streamReader.ReadLine()) != null)
{
decoderBlockingCollection.Add(tempLine);
}
decoderBlockingCollection.CompleteAdding();
}
});
var consumer1 = Task.Factory.StartNew(() =>
{
foreach (var line in decoderBlockingCollection.GetConsumingEnumerable())
{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
writerBlockingCollection.TryAdd(builder.ToString());
Interlocked.Increment(ref processedlines);
if (processedlines == (long)allLines)
writerBlockingCollection.CompleteAdding();
}
});
var writer = Task.Factory.StartNew(() =>
{
foreach (var line in writerBlockingCollection.GetConsumingEnumerable())
{
streamWriter.WriteLine(line);
}
});
Task.WaitAll(producer, consumer1, writer);
}
Solutions, as well as advices how to optimize it a little more is greatly appreciated.
Like I said, I'd probably go for something simpler first, unless or until it's demonstrated that it's not performing well. As Adi said in their answer, this work appears to be I/O bound - so there seems little benefit in creating multiple tasks for it.
publiv void PerformDecode(string sourcePath, string targetPath)
{
File.WriteAllLines(targetPath,File.ReadLines(sourcePath).Select(line=>{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
return builder.ToString();
}));
}
Now, of course, this code actually blocks until it's done, which is why I've not marked it async. But then, so did yours, and it should have been warning about that already.
(You could try using PLINQ instead of LINQ for the Select portion but honestly, the amount of processing we're doing here looks trivial; Profile first before applying any such change)
As the work you are doing is mostly IO bound, you aren't really gaining anything from parallelization. It also looks to me like (correct me if I'm wrong) that your transformation algorithm doesn't depend on you reading the file line-by-line, so I would recommend instead doing something like this:
void Main()
{
//Setup streams for testing
using(var inputStream = new MemoryStream())
using(var outputStream = new MemoryStream())
using (var inputWriter = new StreamWriter(inputStream))
using (var outputReader = new StreamReader(outputStream))
{
//Write test string and rewind stream
inputWriter.Write("abcdefghijklmnop");
inputWriter.Flush();
inputStream.Seek(0, SeekOrigin.Begin);
var inputBuffer = new byte[5];
var outputBuffer = new byte[5];
int inputLength;
while ((inputLength = inputStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
for (var i = 0; i < inputLength; i++)
{
//transform each character
outputBuffer[i] = ++inputBuffer[i];
}
//Write to output
outputStream.Write(outputBuffer, 0, inputLength);
}
//Read for testing
outputStream.Seek(0, SeekOrigin.Begin);
var output = outputReader.ReadToEnd();
Console.WriteLine(output);
//Outputs: "bcdefghijklmnopq"
}
}
Obviously, you would be using FileStreams instead of MemoryStreams, and you can increase the buffer length to something much larger (as this was just a demonstrative example). Also as your original method is Async, you use the async variants of Stream.Write and Stream.Read
I have an SSIS package that's launching another SSIS package in a Foreach container; because the container reports completion as soon as it launched all the packages it had to launch, I need a way to make it wait until all "child" packages have completed.
So I implemented a little sleep-wait loop that basically pulls the Execution objects off the SSISDB for the ID's I'm interested in.
The problem I'm facing, is that a grand total of 0 Dts.Events.FireProgress events get fired, and if I uncomment the Dts.Events.FireInformation call in the do loop, then every second I get a message reported saying 23 packages are still running... except if I check in SSISDB's Active Operations window I see that most have completed already and 3 or 4 are actually running.
What am I doing wrong, why wouldn't runningCount contain the number of actually running executions?
using ssis = Microsoft.SqlServer.Management.IntegrationServices;
public void Main()
{
const string serverName = "REDACTED";
const string catalogName = "SSISDB";
var ssisConnectionString = $"Data Source={serverName};Initial Catalog=msdb;Integrated Security=SSPI;";
var ids = GetExecutionIDs(serverName);
var idCount = ids.Count();
var previousCount = -1;
var iterations = 0;
try
{
var fireAgain = true;
const int secondsToSleep = 1;
var sleepTime = TimeSpan.FromSeconds(secondsToSleep);
var maxIterations = TimeSpan.FromHours(1).TotalSeconds / sleepTime.TotalSeconds;
IDictionary<long, ssis.Operation.ServerOperationStatus> catalogExecutions;
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
do
{
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
var runningCount = catalogExecutions.Count(kvp => kvp.Value == ssis.Operation.ServerOperationStatus.Running);
System.Threading.Thread.Sleep(sleepTime);
//Dts.Events.FireInformation(0, "ScriptMain", $"{runningCount} packages still running.", string.Empty, 0, ref fireAgain);
if (runningCount != previousCount)
{
previousCount = runningCount;
decimal completed = idCount - runningCount;
decimal percentCompleted = completed / idCount;
Dts.Events.FireProgress($"Waiting... {completed}/{idCount} completed", Convert.ToInt32(100 * percentCompleted), 0, 0, "", ref fireAgain);
}
iterations++;
if (iterations >= maxIterations)
{
Dts.Events.FireWarning(0, "ScriptMain", $"Timeout expired, requesting cancellation.", string.Empty, 0);
Dts.Events.FireQueryCancel();
Dts.TaskResult = (int)Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Canceled;
return;
}
}
while (catalogExecutions.Any(kvp => kvp.Value == ssis.Operation.ServerOperationStatus.Running));
}
}
catch (Exception exception)
{
if (exception.InnerException != null)
{
Dts.Events.FireError(0, "ScriptMain", exception.InnerException.ToString(), string.Empty, 0);
}
Dts.Events.FireError(0, "ScriptMain", exception.ToString(), string.Empty, 0);
Dts.Log(exception.ToString(), 0, new byte[0]);
Dts.TaskResult = (int)ScriptResults.Failure;
return;
}
Dts.TaskResult = (int)ScriptResults.Success;
}
The GetExecutionIDs function simply returns all execution ID's for the child packages, from my metadata database.
The problem is that you're re-using the same connection at every iteration. Turn this:
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
do
{
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
Into this:
do
{
using (var connection = new SqlConnection(ssisConnectionString))
{
var server = new ssis.IntegrationServices(connection);
var catalog = server.Catalogs[catalogName];
catalogExecutions = catalog.Executions
.Where(execution => ids.Contains(execution.Id))
.ToDictionary(execution => execution.Id, execution => execution.Status);
}
And you'll get correct execution status every time. Not sure why the connection can't be reused, but keeping connections as short-lived as possible is always a good idea - and that's another proof.
I want to create Publisher and Subscriber Model using MSMQ Multicast feature.
I have already followed the answer in the link without success MSMQ - Cannot receive from Multicast queues
Messages getting sent and received in local machine.
Sender:
using (var helloQueue = new MessageQueue("FormatName:MULTICAST=234.1.1.1:8001"))
{
while (true)
{
var stopWatch = new Stopwatch();
stopWatch.Start();
for (var i = 0; i < 1000; i++)
{
SendMessage(helloQueue,
string.Format("{0}: msg:{1} hello world ", DateTime.UtcNow.Ticks, i));
}
stopWatch.Stop();
Console.ReadLine();
Console.WriteLine("====================================================");
Console.WriteLine("[MSMQ] done sending 1000 messages in " + stopWatch.ElapsedMilliseconds);
Console.WriteLine("[MSMQ] Sending reset counter to consumers.");
SendMessage(helloQueue, "reset");
Console.ReadLine();
}
}
Receiver:
int messagesReceived = 0;
var messages = new Queue<string>(5000);
var filePath = typeof(Subscriber).FullName + ".txt";
var path = #".\private$\hello-queue";
using (var helloQueue = new MessageQueue(path))
{
helloQueue.MulticastAddress = "234.1.1.1:8001";
while (true)
{
var message = helloQueue.Receive();
if (message == null)
return;
var reader = new StreamReader(message.BodyStream);
var body = reader.ReadToEnd();
messagesReceived += 1;
messages.Enqueue(body);
Console.WriteLine(" [MSMQ] {0} Received {1}", messagesReceived, body);
if (string.CompareOrdinal("reset", body) == 0)
{
messagesReceived = 0;
File.WriteAllText(filePath, body);
messages.Clear();
}
}
}
I added key in registry for multicastbind with IP showing in event log(not sure about this).
The MulticastAddress that we specify in queue, is it something specific or we can use anything in the range specified ?
This got resolved by just changing the port number. Rest was fine.
I wrote a method to download data from the internet and save it to my database. I wrote this using PLINQ to take advantage of my multi-core processor and because it is downloading thousands of different files in a very short period of time. I have added comments below in my code to show where it stops but the program just sits there and after awhile, I get an out of memory exception. This being my first time using TPL and PLINQ, I'm extremely confused so I could really use some advice on what to do to fix this.
UPDATE: I found out that I was getting a webexception constantly because the webclient was timing out. I fixed this by increasing the max amount of connections according to this answer here. I was then getting exceptions for the connection not opening and I fixed it by using this answer here. I'm now getting connection timeout errors for the database even though it is a local sql server. I still haven't been able to get any of my code to run so I could totally use some advice
static void Main(string[] args)
{
try
{
while (true)
{
// start the download process for market info
startDownload();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void startDownload()
{
DateTime currentDay = DateTime.Now;
List<Task> taskList = new List<Task>();
if (Helper.holidays.Contains(currentDay) == false)
{
List<string> markets = new List<string>() { "amex", "nasdaq", "nyse", "global" };
Parallel.ForEach(markets, market =>
{
Downloads.startInitialMarketSymbolsDownload(market);
}
);
Console.WriteLine("All downloads finished!");
}
// wait 24 hours before you do this again
Task.Delay(TimeSpan.FromHours(24)).Wait();
}
public static void startInitialMarketSymbolsDownload(string market)
{
try
{
List<string> symbolList = new List<string>();
symbolList = Helper.getStockSymbols(market);
var historicalGroups = symbolList.AsParallel().Select((x, i) => new { x, i })
.GroupBy(x => x.i / 100)
.Select(g => g.Select(x => x.x).ToArray());
historicalGroups.AsParallel().ForAll(g => getHistoricalStockData(g, market));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void getHistoricalStockData(string[] symbols, string market)
{
// download data for list of symbols and then upload to db tables
Uri uri;
string url, line;
decimal open = 0, high = 0, low = 0, close = 0, adjClose = 0;
DateTime date;
Int64 volume = 0;
string[] lineArray;
List<string> symbolError = new List<string>();
Dictionary<string, string> badNameError = new Dictionary<string, string>();
Parallel.ForEach(symbols, symbol =>
{
url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol + "&a=00&b=1&c=1900&d=" + (DateTime.Now.Month - 1) + "&e=" + DateTime.Now.Day + "&f=" + DateTime.Now.Year + "&g=d&ignore=.csv";
uri = new Uri(url);
using (dbEntities entity = new dbEntities())
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(uri))
using (StreamReader reader = new StreamReader(stream))
{
while (reader.EndOfStream == false)
{
line = reader.ReadLine();
lineArray = line.Split(',');
// if it isn't the very first line
if (lineArray[0] != "Date")
{
// set the data for each array here
date = Helper.parseDateTime(lineArray[0]);
open = Helper.parseDecimal(lineArray[1]);
high = Helper.parseDecimal(lineArray[2]);
low = Helper.parseDecimal(lineArray[3]);
close = Helper.parseDecimal(lineArray[4]);
volume = Helper.parseInt(lineArray[5]);
adjClose = Helper.parseDecimal(lineArray[6]);
switch (market)
{
case "nasdaq":
DailyNasdaqData nasdaqData = new DailyNasdaqData();
var nasdaqQuery = from r in entity.DailyNasdaqDatas.AsParallel().AsEnumerable()
where r.Date == date
select new StockData { Close = r.AdjustedClose };
List<StockData> nasdaqResult = nasdaqQuery.AsParallel().ToList(); // hits this line
break;
default:
break;
}
}
}
// now save everything
entity.SaveChanges();
}
}
);
}
Async lambdas work like async methods in one regard: They do not complete synchronously but they return a Task. In your parallel loop you are simply generating tasks as fast as you can. Those tasks hold onto memory and other resources such as DB connections.
The simplest fix is probably to just use synchronous database commits. This will not result in a loss of throughput because the database cannot deal with high amounts of concurrent DML anyway.