I am working on a custom texteditor control and encountered this problem.
I need a function that gets the character indexes for every newline "\n" in the text.
I already have two ways to accomplish this:
private List<int> GetNewLineLocations()
{
var list = new List<int>();
int ix = 0;
foreach (var c in this.Text)
{
if (c == '\n') list.Add(ix);
ix++;
}
Debug.WriteLine(ix);
return list;
}
And:
private List<int> GetNewLineLocations()
{
var list = new List<int>();
int ix = -1;
for (int i = 0; i < this.Lines.Length; i++)
{
ix += Lines[i].Length;
ix += 1;
list.Add(ix);
}
return list;
}
The first solution does work but slows down the more text is entered in the richtextbox that is around 40000 characters but that can be spread out among a lot of rows like 20000.
The second one seems to be faster because it loops less and does more or less the same but is slows down dramatically at 1000 rows no mater how much text they contain.
The code of course needs to run fast and not use a lot of resources that is why I thought the second solution would be better.
My question is:
Which solution is better and why?
Why is the second solution so much slower?
Is there an even better solution?
I tried both of your examples and Felix's and a solution of my own using a rich text box and 40k lines. The result was this was the fastest, and I saw no slow down. Can you try passing the array of lines as a paramater and let us know the result?
public static List<int> GetNewLineLocations(this string[] lines)
{
var list = new List<int>();
int ix = -1;
for (int i = 0; i < lines.Length; i++)
{
ix += lines[i].Length+1;
list.Add(ix);
}
return list;
}
When working with strings Regular Expressions are very nice to use. But they are not the fastest. If you need faster processing you should do it on lower levels and in parallel. And make sure to use long as index because int only allow you to process up to 2^31 chars, and long up to 2^63 chars.
I agree with #Nyerguds
who sayed in the comments:
The problem is that the standard function to fetch the text in a rich text box is actually a processing function that has to filter out the RTF markup. The actual function to fetch the text is the bottleneck, not what comes after it.
So your data should be held somewhere in the code and not in the userinterface. Sooner or later when processing long texts that will cause trouble anyway, like stuttering when scrolling or further bottlenecks. And I would only represent the lines that could be displayed in the control anyway. So you should overthink your application design. Check your Front/Backend seperation. Storing your data in a backend will allow you to access your data directly without depending in your Textbox methods or other userinterface stuff.
Here is a sample how to easy process data with the Parallel Class of the .net framework:
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp1
{
internal class Program
{
public static byte[] _globalDataStore { get; set; }
private static void Main(string[] args)
{
DoStuff();
ShowDone();
}
private static void ShowDone()
{
Console.WriteLine("done...");
Console.ReadKey();
}
private static void DoStuff()
{
var tempData = GetData();
StoreData(ref tempData);
tempData = null; //free some ram
var dataIdentifier = (byte)'\n';
GetAndPromptDataPositions(_globalDataStore, dataIdentifier);
}
private static void GetAndPromptDataPositions<T>(T[] data, T dataIdentifier)
{
var dataPositionList = GetDataPositions<T>(data, dataIdentifier);
PromptDataPostions(dataPositionList);
}
private static void PromptDataPostions(IEnumerable<long> positionList)
{
foreach (var position in positionList)
{
Console.WriteLine($"Position '{position}'");
}
}
private static string GetData()
{
return "aasdlj\naksdlkajsdlkasldj\nasld\njkalskdjasldjlasd";
}
private static void StoreData(ref string tempData)
{
_globalDataStore = Encoding.ASCII.GetBytes(tempData);
}
private static List<long> GetDataPositions<T>(T[] data, T dataToFind)
{
lock (data) //prevent data from being changed while processing, important when have other threaded could write data
{
var postitonList = new List<long>();
Parallel.For(0, data.LongLength, (position) =>
{
if (data[position].Equals(dataToFind))
{
lock (postitonList) //lock list because of multithreaded access to prevent data corruption
{
postitonList.Add(position);
}
}
});
return postitonList;
}
}
}
}
Related
I know of and have used the System.Collections.Concurrent.ConcurrentBag<T> for building thread safe code in the past. I have some legacy code that I'm trying to thread to increase performance however there is a non static List object that is being written with different sources of data. All of the writing is being done prior to any reading of the list and my initial tests show that multiple threads appear to write to the object without any issues.
Sample Windows App
Does a non static C# List object have thread safety for writing across multiple threads prior to reading? How can this be tested?
BackgroundWorker backgroundWorkerA, backgroundWorkerB;
System.Threading.ManualResetEvent manualReset;
List<string> _shardList = new List<string>(0);
public UserControl1()
{
InitializeComponent();
manualReset = new System.Threading.ManualResetEvent(false);
backgroundWorkerA = new BackgroundWorker();
backgroundWorkerA.WorkerSupportsCancellation = true;
backgroundWorkerA.DoWork += BackgroundWorkerA_DoWork;
backgroundWorkerB = new BackgroundWorker();
backgroundWorkerB.WorkerSupportsCancellation = true;
backgroundWorkerB.DoWork += BackgroundWorkerB_DoWork;
this.HandleCreated += UserControl1_HandleCreated;
}
private void UserControl1_HandleCreated(object sender, EventArgs e)
{
backgroundWorkerA.RunWorkerAsync(_shardList);
backgroundWorkerB.RunWorkerAsync(_shardList);
manualReset.Set();
}
private void BackgroundWorkerB_DoWork(object sender, DoWorkEventArgs e)
{
List<string> _shardList = (List<string>)e.Argument;
manualReset.WaitOne();
int _i = 0;
while(!this.backgroundWorkerB.CancellationPending)
{
_shardList.Add("b" + _i++.ToString());
System.Diagnostics.Debug.WriteLine("b is running");
}
thread2.Invoke(new MethodInvoker(delegate { thread2.Text = string.Join(System.Environment.NewLine, _shardList.ToArray()); }));
}
private void button1_Click(object sender, EventArgs e)
{
backgroundWorkerA.CancelAsync();
backgroundWorkerB.CancelAsync();
}
private void BackgroundWorkerA_DoWork(object sender, DoWorkEventArgs e)
{
List<string> _shardList = (List<string>)e.Argument;
manualReset.WaitOne();
int _i = 0;
while (!this.backgroundWorkerA.CancellationPending)
{
_shardList.Add("a" + _i++.ToString());
System.Diagnostics.Debug.WriteLine("a is running");
}
thread1.Invoke(new MethodInvoker(delegate { thread1.Text = string.Join(System.Environment.NewLine, _shardList.ToArray()); }));
}
There are multiple things that make concurrent writes to a List<T> unsafe.
First let's have a look at the code of the Add method:
public void Add(T item) {
if (_size == _items.Length) EnsureCapacity(_size + 1);
_items[_size++] = item;
_version++;
}
The first issue is EnsureCapacity. If the list's inner array isn't big enough to receive the new element, it will create a new, bigger array, and copy the elements from the old one to the new one. If a thread writes in the old list after the copy but before the swap, the element will be lost
The second issue is the non-atomic increment of size. If two threads try to write at the same time, they may write at the same index in the array, thus losing one item
Those race conditions are not very likely, but they'll eventually happen if you keep writing in the same list from multiple threads.
When you modify a list, it has to modify the backing array. If one operation is making a change to the backing array at the same time as another, this can put the list into a broken state. You won't see this often unless you're doing very high frequency concurrent operations, but it's a lot better to use a concurrent collection then to discover the issue in production a few weeks or months later.
The following code just executes 1000000 writes in a row simultaneously on each core. On a multi-core machine, this will almost certainly throw an exception because the underlying array gets modified when another concurrent call is not expecting it.
static void Main(string[] args)
{
var list = new List<string>();
void mutateList()
{
for (var i = 0; i < 1000000; i++)
{
list.Add("foo");
}
}
for (var i = 0; i < Environment.ProcessorCount; i++)
{
new Thread(mutateList).Start();
}
Thread.Sleep(-1);
}
So Here is the Program Again
As u can see i have Created 2 methods with a Working Loop
And Created 2 threads in main pointing towards these methods and they are started
What gets out as output is Both Loops work like 1 and then space and in new line 1 and so on
But what i want is to make them appear in the same row line side by side As we divide a page in 2 parts and write things in lines
I do not want To make them Work Seperately but at a time and in the same line but Different columns
I know it can be acheived by Writing Both Objects in same Console .wl but i want to acheive it this way by these 2 threads
Please provide valuable solutions that would work
Thanks
using System;
using System.Threading;
class Program
{
static void Main(string [] args)
{
Thread t1 = new Thread(code1);
Thread t2= new Thread (code2);
t1.Start();
t2.Start();
}
static void code1()
{
for(int i=0;i<50;i++)
{
Console.WriteLine(i);
Thread.Sleep(1000);
}
}
static void code2()
{
for(int i=0;i<50;i++)
{
Console.WriteLine("/t/t"+i);
Thread.Sleep(1000);
}
}}
You have to use the Console.SetCursorPosition(int left, int top) method, so you can write on the Console starting from any position you want, also back in the previous rows.
Obviously, you have to keep trace of the position for each Thread. That is, the current row of that Thread, and its first column.
In my example I made 2 threads, one with the first column in position 0, and the second with the first column in position 50. Be careful about the width of the strings that you need to write, or they will overflow their own space on the Console.
Also, because you are doing it in a multithreading app, you need a lock on the Console. Otherwise, suppose this: a Thread sets the CursorPosition, then another Thread sets it, then the scheduler returns to the first Thread... the first Thread writes on the second Thread's position!
This is a very simple Console Program that gets the point:
using System;
using System.Threading;
namespace StackOverflow_3_multithread_on_console
{
class Program
{
static Random _random = new Random();
static void Main(string[] args)
{
var t1 = new Thread(Run1);
var t2 = new Thread(Run2);
t1.Start();
t2.Start();
}
static void Run1()
{
for(int i = 0; i < 30; i++)
{
Thread.Sleep(_random.Next(2000)); //for test
ConsoleLocker.Write("t1:" + i.ToString(), 0, i);
}
}
static void Run2()
{
for (int i = 0; i < 30; i++)
{
Thread.Sleep(_random.Next(2000)); //for test
ConsoleLocker.Write("t2:" + i.ToString(), 30, i);
}
}
}
static class ConsoleLocker
{
private static object _lock = new object();
public static void Write(string s, int left, int top)
{
lock (_lock)
{
Console.SetCursorPosition(left, top);
Thread.Sleep(100); //for test
Console.Write(s);
}
}
}
}
All the Thread.Sleep are there just to demonstrate that the lock works well. You can remove all them, especially the one in the ConsoleLocker.
I am using two threads in a C# application that access the same BlockingCollection. This works fine, but I want to retrieve the first value twice so the two threads retrieve the same value *.
After a few seconds I want to poll the currentIndex of both threads and delete every value < index. So for example the lowest currentIndex of a thread is 5, the application deletes theitems at index 0 -5 in the queue. Another solution is to delete the value in the queue if all threads processed the value.
How can I accomplish this? I think I need another type of buffer..?
Thank you in advance!
*If .Take() is called by thread1, the item is removed in the collection and thread2 can't get the same item again.
Update:
I want to store data in a buffer, so for example thread1 saves the data to a HDD and thread2 analyzes the (same) data (concurrent).
Use a producer-consumer to add Value1 to two separate ConcurrentQueues. Have the threads dequeue then process them from their own queue.
Edit 7/4/14:
Here's a, hazy, hacky, and half thought out solution: Create a custom object that is buffered. It could include space for both the information you're trying to buffer in thread 1 and the analysis results in thread 2.
Add the objects to a buffer in thread 1 and a BlockingCollection. Use thread 2 to analyse the results and update the objects with the results. The blocking collection shouldn't get too big, and since it's only dealing with references shouldn't hit your memory. This assumes that you won't be modifying the info in the buffer at the same time on both threads.
Another, also half thought out solution is to feed the info into the buffer and a blocking collection simultaneously. Analyse the data from the BlockingCollection, feed it into an output collection and match them up with the buffer again. This option can handle concurrent modification if you do it right, but is probably more work.
I think option one is better. As I've pointed out, these are only half-formed, but they might help you find something that suits your specific needs. Good luck.
I would suggest to rethink your design.
When you have a list of items which have to processed then give each thread a queue of items which he have to work on.
With such a solution it wouldn't be a problem to give both or more threads the same value to process.
Something like this, not tested just typed.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Collections.Concurrent;
namespace ConsoleApplication2
{
class Item
{
private int _value;
public int Value
{
get
{
return _value;
}
}
// all you need
public Item(int i)
{
_value = i;
}
}
class WorkerParameters
{
public ConcurrentQueue<Item> Items = new ConcurrentQueue<Item>();
}
class Worker
{
private Thread _thread;
private WorkerParameters _params = new WorkerParameters();
public void EnqueueItem(Item item)
{
_params.Items.Enqueue(item);
}
public void Start()
{
_thread = new Thread(new ParameterizedThreadStart(ThreadProc));
_thread.Start();
}
public void Stop()
{
// build somthing to stop your thread
}
public static void ThreadProc(object threadParams)
{
WorkerParameters p = (WorkerParameters)threadParams;
while (true)
{
while (p.Items.Count > 0)
{
Item item = null;
p.Items.TryDequeue(out item);
if (item != null)
{
// do something
}
}
System.Threading.Thread.Sleep(50);
}
}
}
class Program
{
static void Main(string[] args)
{
Worker w1 = new Worker();
Worker w2 = new Worker();
w1.Start();
w2.Start();
List<Item> itemsToProcess = new List<Item>();
for (int i = 1; i < 1000; i++)
{
itemsToProcess.Add(new Item(i));
}
for (int i = 1; i < 1000; i++)
{
w1.EnqueueItem(itemsToProcess[i]);
w2.EnqueueItem(itemsToProcess[i]);
}
}
}
}
I am attempting to model slot machine behavior with parallel programming, using a base class (SlotBase) which will be overridden by classes representing each individual machine.
In the base class, I have a method that (attempts to) invoke a descendant class's method (marked abstract in the base class) that plays a sample and returns results through use of output parameters, and then using a mutually exclusive lock, updates the base class's summation variables.
Relevant code samples as follows:
From SlotBase.cs:
#region Member Variables
protected long m_CoinIn;
protected long[] m_CoinOut;
protected string[] m_FeatureList;
protected long[] m_HitCount;
// .. and many more, but redacted for length
#endregion
protected abstract void PlaySample(long sampleSize, out long coinIn, out long[] coinOut, out long[] hitCount);
protected override void DoSimulation() {
// No need to intialize these, as the calling routine will, and only it knows how big the arrays need to be, and so forth.
object mutex = new object();
if (ParallelMode) {
int periodCount = (int)(BatchSize / PeriodSize);
Parallel.For(0, periodCount, delegate(int i) {
long coinIn;
long[] coinOut;
long[] hitCount;
PlaySample(PeriodSize, out coinIn, out coinOut, out hitCount);
lock (mutex) {
Console.WriteLine("Coin in this batch: {0}", coinIn);
m_CoinIn += coinIn;
for (int j = 0; j < m_FeatureList.Length; ++j) {
m_CoinOut[j] += coinOut[j];
m_HitCount[j] += hitCount[j];
}
}
});
}
}
.. and from a typical subclass implementation:
protected override void PlaySample(long sampleSize, out long coinIn, out long[] coinOut, out long[] hitCount) {
switch (WagerIndex) {
case (int)WagerType.Main: {
RNG localRNG = SpawnRNG();
coinIn = 0;
coinOut = new long[m_FeatureList.Length];
hitCount = new long[m_FeatureList.Length];
for (long iter = 0; iter < sampleSize; ++iter) {
coinIn += m_LinesPlayed;
double[] output = GetSpinResults(ref localRNG, (int)SpinMode.MainSpin);
for (int i = 0; i < m_FeatureList.Length; ++i) {
coinOut[i] += (long)output[i];
if (output[i] > 0) ++hitCount[i];
}
}
break;
}
default: {
throw new Exception(string.Format("Wager Type index {0} not supported", WagerIndex));
}
}
}
.. this actually works quite effectively with small values of SampleSize and PeriodSize but quickly proceeds to hang (or virtually hang) as values increase.
I've tried commenting out the variable updates and the hanging continues in earnest, which suggests that the problem is in fact with the way I'm implementing the Parallel.For loop.
I have no problems tearing this down pretty much from scratch and rebuilding to get this working. My only really important design goal in that I have the same paradigm (one base class, multiple sub-classes for implementation of different slots).
Where am I going wrong?
I have a function like this:
foreach (ListViewItem item in getListViewItems(listView2)) //for proxy
{
if (reader.Peek() == -1)
{
break;
}
lock (reader)
{
line = reader.ReadLine();
}
//proxy code
List<string> mylist = new List<string>();
if (item != null)
{
for (int s = 0; s < 3; s++)
{
if (item.SubItems[s].Text != null)
{
mylist.Add(item.SubItems[s].Text);
}
else
{
mylist.Add("");
}
}
}
else
{
break;
}
//end proxy code
//some other code including the threadpool
}
and the delegate code:
private delegate ListView.ListViewItemCollection GetItems(ListView lstview);
private ListView.ListViewItemCollection getListViewItems(ListView lstview)
{
ListView.ListViewItemCollection temp = new ListView.ListViewItemCollection(new ListView());
if (!lstview.InvokeRequired)
{
foreach (ListViewItem item in lstview.CheckedItems)
{
temp.Add((ListViewItem)item.Clone());
}
return temp;
}
else
{
return (ListView.ListViewItemCollection)this.Invoke(new GetItems(getListViewItems), new object[] { lstview });
}
}
EDIT:
I wanna replace that foreach loop in the main function with a conditional function:
if (reader.Peek() == -1)
{
break;
}
lock (reader)
{
line = reader.ReadLine();
}
if (use_proxy == true)
{
mylist2 = get_current_proxy();
}
//some other code including the threadpool
private List<string> get_current_proxy()
{
//what shall I add here?
}
How can I make that function do the same as foreach loop but using for loop? I mean getting the proxies one by one ...
I see multiple questions revolving around an idea of scraping a website for emails then spamming. You have very cool tools for that already, no need for a new one.
Anyway - I don't understand your question, and it seems that I'm not the only one here, but the thing you'll have to KNOW before anything else is:
Having ANYTHING in Windows run in multiple threads will ultimately have to be synchronized when you do Invoke() which HAVE TO wait until it all passes through ONE thread and that's the one that holds a message loop. So you can try to read from or write to ListView from multiple threads, but to do each read/write you'll have to Invoke() (you probably tried it directly and BAAAAM) and every Invoke() has only ONE hole to go through, and all your threads will have to wait their turn.
Next: having ListView to be a CONTAINER for your data is so BAD I can't even comment any further. Consider something as a
class MyData
{
public string Name;
public string URL;
// ...
}
and
List<MyData> _myData;
to hold your data. You'll be able to access it from multiple threads, if you take care of some low-key sync issues.
Lastly, how come you ask us questions about .net C# programming if you don't even know the syntax. Well, it's rhetorical, ...