Unexpected behavior of static variable initialization - c#

I'm not much familiar with WinRT. I'm encountering an unexpected behavior. I've a static variable _Verses that is initialized in static constructor of class. So expected behavior is _Verses will be initialized before first reference to static method as explained in When is a static constructor called in C#?
But when I call a static async function LoadData (WinRT) I got exception.
Object Reference not set to an instance of object.
My Code is:
public VerseCollection
{
public const int TotalVerses = 6236;
static Verse[] _Verses;
static VerseCollection()
{
_Verses = new Verse[TotalVerses];
}
internal static async void LoadData(StorageFile file)
{
using (var reader = new BinaryReader(await file.OpenStreamForReadAsync()))
{
int wId = 0;
for (int i = 0; i < VerseCollection.TotalVerses; i++)
{
var retValue = new string[reader.ReadInt32()];
for (int j = 0; j < retValue.Length; j++)
retValue[j] = reader.ReadString();
_Verses[i] = new Verse(i, wId, retValue);
wId += _Verses[i].Words.Count;
}
}
}
}
public Book
{
public static async Task<Book> CreateInstance()
{
VerseCollection.LoadData(await DigitalQuranDirectories.Data.GetFileAsync("quran-uthmani.bin"));
}
}
I call the function CreateInstance as:
async void DoInit()
{
await DigitalQuran.Book.CreateInstance();
}
Same code is working in desktop but not working for WinRT. Full Code of Book Class for Desktop is here and for VerseCollection class is here
EDIT:
Complete code is here
public class Book : VerseSpan
{
public static async Task<Book> CreateInstance()
{
_Instance = new Book();
VerseCollection.LoadData(await DigitalQuranDirectories.Data.GetFileAsync("quran-uthmani.bin"));
PrivateStorage.LoadQuranObjectsFromMetadata();
// Some Other Operations too
return _Instance;
}
}
public class VerseCollection
{
static Verse[] _Verses = new Verse[TotalVerses];
internal static async void LoadData(StorageFile file)
{
using (var reader = new BinaryReader(await file.OpenStreamForReadAsync()))
{
int wId = 0;
for (int i = 0; i < VerseCollection.TotalVerses; i++)
{
var retValue = new string[reader.ReadInt32()];
for (int j = 0; j < retValue.Length; j++)
retValue[j] = reader.ReadString();
_Verses[i] = new Verse(i, wId, retValue);
wId += _Verses[i].Words.Count;
}
}
}
}
public class Verse
{
public Verse(int number, int firstWordIndex, string[] words)
{
GlobalNumber = number + 1;
Words = new WordCollection(firstWordIndex, words, this);
}
}
public class WordCollection : ReadOnlyCollection<Word>
{
public const int TotalWords = 77878;
static Word[] _Words = new Word[TotalWords];
static string[] _WordsText = new string[TotalWords];
public WordCollection(int startIndex, int count)
: base(count)
{
this.startIndex = startIndex;
}
internal WordCollection(int startId, string[] words, Verse verse) : this(startId, words.Length)
{
int max = words.Length + startId;
for (int i = startId; i < max; i++)
{
_Words[i] = new Word(i, verse);
_WordsText[i] = words[i - startId];
}
}
}
public abstract class ReadOnlyCollection<T> : IEnumerable<T>
{
public ReadOnlyCollection(int count)
{
Count = count;
}
}
public class PrivateStorage
{
internal static async void LoadQuranObjectsFromMetadata()
{
using (var reader = new BinaryReader(await (await DigitalQuranDirectories.Data.GetFileAsync(".metadata")).OpenStreamForReadAsync()))
{
/* 1 */ ChapterCollection.LoadData(EnumerateChapters(reader));
/* 2 */ PartCollection.LoadData(EnumerateParts(reader));
/* Some other tasks */
}
}
static IEnumerator<ChapterMeta> EnumerateChapters(BinaryReader reader)
{
for (int i = 0; i < ChapterCollection.TotalChapters; i++)
{
yield return new ChapterMeta()
{
StartVerse = reader.ReadInt32(),
VerseCount = reader.ReadInt32(),
BowingCount = reader.ReadInt32(),
Name = reader.ReadString(),
EnglishName = reader.ReadString(),
TransliteratedName = reader.ReadString(),
RevelationPlace = (RevelationPlace)reader.ReadByte(),
RevelationOrder = reader.ReadInt32()
};
}
}
static IEnumerator<PartMeta> EnumerateParts(BinaryReader reader)
{
for (int i = 0; i < PartCollection.TotalParts; i++)
{
yield return new PartMeta()
{
StartVerse = reader.ReadInt32(),
VerseCount = reader.ReadInt32(),
ArabicName = reader.ReadString(),
TransliteratedName = reader.ReadString()
};
}
}
}
public class ChapterCollection : ReadOnlyCollection<Chapter>
{
public const int TotalChapters = 114;
static Chapter[] _Chapters = new Chapter[TotalChapters];
internal static void LoadData(IEnumerator<ChapterMeta> e)
{
for (int i = 0; i < TotalChapters; i++)
{
e.MoveNext();
_Chapters[i] = new Chapter(i, e.Current);
}
}
}
public class PartCollection : ReadOnlyCollection<Part>
{
public const int TotalParts = 30;
static Part[] _Parts = new Part[TotalParts];
internal static void LoadData(IEnumerator<PartMeta> e)
{
for (int i = 0; i < TotalParts; i++)
{
e.MoveNext();
_Parts[i] = new Part(i, e.Current);
}
}
}
When I run the code with debugger no exception is raised. Further After exception visual studio shows some times in class VerseCollection in function LoadData on line _Verses[i] = new Verse(i, wId, retValue); (_Verses is null) and some times in class ChapterCollection in Function LoadData on line _Chapters[i] = new Chapter(i, e.Current); (_Chapters is null)

There was issue with asynchronous call. File reading is asynchronous operation in WinRT. As We can't call async method with void return type with await statement. So next instructions executes without waiting for completion of last executing as another Task. This leads to NullReferanceExecption.
I managed to solve my problems by changing return type of all async operations from void to Task and called them with await like in the code below.
public class Book : VerseSpan
{
public static async Task<Book> CreateInstance()
{
_Instance = new Book();
await VerseCollection.LoadData(await DigitalQuranDirectories.Data.GetFileAsync("quran-uthmani.bin"));
await PrivateStorage.LoadQuranObjectsFromMetadata();
// Some Other Operations too
return _Instance;
}
}
public class VerseCollection
{
static Verse[] _Verses = new Verse[TotalVerses];
internal static async Task LoadData(StorageFile file)
{
using (var reader = new BinaryReader(await file.OpenStreamForReadAsync()))
{
int wId = 0;
for (int i = 0; i < VerseCollection.TotalVerses; i++)
{
var retValue = new string[reader.ReadInt32()];
for (int j = 0; j < retValue.Length; j++)
retValue[j] = reader.ReadString();
_Verses[i] = new Verse(i, wId, retValue);
wId += _Verses[i].Words.Count;
}
}
}
}
public class PrivateStorage
{
internal static async Task LoadQuranObjectsFromMetadata()
{
using (var reader = new BinaryReader(await (await DigitalQuranDirectories.Data.GetFileAsync(".metadata")).OpenStreamForReadAsync()))
{
/* Some tasks */
}
}
}

Because it is running on Desktop but not WinRT, it leads me to believe there is an issue with your asynchronous call. Because you are doing this asynchronously, there is no gaurantee that the constructor (static or not) will be finished running before the call to LoadData. Make sure that your constructor has finished executing before calling LoadData function, and this should give you consistent behaviour.

Related

Using httplistener handle more concurrent request

Goal: To be able to handle thousands of request per second
Current Result: while the test code was at request 15k the http server had only processed about 3-400 request
My first iteration of this code I had set HTTP_HANDLER_THREADS to 2 where as after about 200 proccessed the server is overloaded and crashes.
I then up this number to 5 with similiar results.
I then up the number to 5000 and I got to around 800. This seems to me I am doing something very wrong because there is no way my system is running 5000 threads and based on running top -H -p <pid> I could see the thread pool and it did not open 5000 threads.
Very confused and would like help on how to adjust this to handle thousands of request
TestCase
class Program
{
private static readonly HttpClient client = new HttpClient();
private static int send_amount = 200000;
private static int sent_request = 0;
static void Main(string[] args)
{
Parallel.For(0, send_amount, i =>
{
var values = new Dictionary<string, string>{
{ "request", i.ToString() }
};
var content = new FormUrlEncodedContent(values);
var response = client.PostAsync("http://192.168.102.165:1990", content);
sent_request += 1;
Console.WriteLine($"Sent request {sent_request}/{send_amount}");
});
}
}
Constants
internal class Constants
{
public static bool IsDebugMode = false;
public const string PRODUCTION_IP = "192.168.102.165";
public const string DEVELOPMENT_IP = "192.168.102.165";
public const Int32 HTTP_PORT = 1990;
public const Int32 HTTPS_PORT = 1990;
public const Int32 HTTP_HANDLER_THREADS = 5000;
public static string[] SERVER_BINDS
{
get
{
var server_binds = new string[]
{
$"http://{(IsDebugMode ? DEVELOPMENT_IP : PRODUCTION_IP)}:{HTTP_PORT}/"
};
return server_binds;
}
}
}
Http server class
internal class HTTPServer
{
private readonly ProcessDataDelegate handler;
private readonly HttpListener listener;
public HTTPServer(HttpListener listener, string[] prefixes, ProcessDataDelegate handler)
{
this.listener = listener;
this.handler = handler;
for (var i = 0; i < prefixes.Length; i++)
{
listener.Prefixes.Add(prefixes[i]);
}
}
public void Start()
{
if (listener.IsListening)
{
return;
}
listener.Start();
for (var i = 0; i < Constants.HTTP_HANDLER_THREADS; i++)
{
listener.GetContextAsync().ContinueWith(ProcessRequestHandler);
}
}
public void Stop()
{
if (listener.IsListening)
{
listener.Stop();
}
}
private void ProcessRequestHandler(Task<HttpListenerContext> result)
{
var context = result.Result;
if (!listener.IsListening) return;
//Start a new listener which will replace this
listener.GetContextAsync().ContinueWith(ProcessRequestHandler);
//Read request
var request = new StreamReader(context.Request.InputStream).ReadToEnd();
//Prepare response
var response_bytes = handler.Invoke(request);
context.Response.ContentLength64 = response_bytes.Length;
var output = context.Response.OutputStream;
output.WriteAsync(response_bytes, 0, response_bytes.Length);
output.Close();
}
}
Program
class Program
{
private static Thread _console_thread;
private static int request_amount = 0;
static void Main(string[] args)
{
InitConsoleThread();
InitHttpServer();
}
private static void InitHttpServer()
{
var http_listener = new HttpListener();
var http_server = new HTTPServer.HTTPServer(http_listener, Constants.SERVER_BINDS, ProcessResponse);
http_server.Start();
}
private static byte[] ProcessResponse(string response)
{
request_amount += 1;
Console.WriteLine(request_amount);
//Sleep was added here to simulate the code doing something
Thread.Sleep(2000);
return new byte[0];
}
private static void InitConsoleThread()
{
_console_thread = new Thread(ConsoleLoop)
{
Name = "ConsoleThread"
};
_console_thread.Start();
}
private static void ConsoleLoop()
{
SpinWait.SpinUntil(() => false);
}
}

how to create last file with renaming messages where message batch size is not greater than equal to 240

In below application,
Producer method adding messages to a blocking collection.
In Consumer method, I'm consuming blocking collection and adding messages to a list and when size >= 240, writing that list to json file.
At some point I don't have any new messages in blocking collection, but in Consumer, I have a list of messages which is not >=240 in size, then in this case , the app is not able to write to a new JSON file (rest of the data).
How can I let the Consumer know that no new messages coming up, write whatever left with you in a new file?
Is this possible? let say Consumer will wait for 1 minute and if there is no new messages, then write whatever left in an new file?
Here is the code (here I'm adding 11 messages. Till 9 messages the batch size is 240 and it's generates a file, but message no 10 & 11 not able to write in new file),
class Program
{
private static List<Batch> batchList = new List<Batch>();
private static BlockingCollection<Message> messages = new BlockingCollection<Message>();
private static int maxbatchsize = 240;
private static int currentsize = 0;
private static void Producer()
{
int ctr = 1;
while (ctr <= 11)
{
messages.Add(new Message { Id = ctr, Name = $"Name-{ctr}" });
Thread.Sleep(1000);
ctr++;
}
}
private static void Consumer()
{
foreach (var message in messages.GetConsumingEnumerable())
{
var msg = JsonConvert.SerializeObject(message);
Console.WriteLine(msg);
if (currentsize + msg.Length >= maxbatchsize)
{
WriteToFile(batchList);
}
batchList.Add(new Batch { Message = message });
currentsize += msg.Length;
}
}
private static void WriteToFile(List<Batch> batchList)
{
using (StreamWriter outFile = System.IO.File.CreateText(Path.Combine(#"C:\TEMP", $"{DateTime.Now.ToString("yyyyMMddHHmmssfff")}.json")))
{
outFile.Write(JsonConvert.SerializeObject(batchList));
}
batchList.Clear();
currentsize = 0;
}
static void Main(string[] args)
{
var producer = Task.Factory.StartNew(() => Producer());
var consumer = Task.Factory.StartNew(() => Consumer());
Console.Read();
}
}
}
Supporting classes,
public class Message
{
public int Id { get; set; }
public string Name { get; set; }
}
public class Batch
{
public Message Message { get; set; }
}
Update:
class Program
{
private static readonly List<Batch> BatchList = new List<Batch>();
private static readonly BlockingCollection<Message> Messages = new BlockingCollection<Message>();
private const int Maxbatchsize = 240;
private static int _currentsize;
private static void Producer()
{
int ctr = 1;
while (ctr <= 11)
{
Messages.Add(new Message { Id = ctr, Name = $"Name-{ctr}" });
Thread.Sleep(1000);
ctr++;
}
Messages.CompleteAdding();
}
private static void Consumer()
{
foreach (var message in Messages.GetConsumingEnumerable())
{
if (_currentsize >= Maxbatchsize)
{
var listToWrite = new Batch[BatchList.Count];
BatchList.CopyTo(listToWrite);
BatchList.Clear();
_currentsize = 0;
WriteToFile(listToWrite.ToList());
}
else
{
Thread.Sleep(1000);
if (Messages.IsAddingCompleted)
{
var remainSize = Messages.Select(JsonConvert.SerializeObject).Sum(x => x.Length);
if (remainSize == 0)
{
var lastMsg = JsonConvert.SerializeObject(message);
BatchList.Add(new Batch { Message = message });
_currentsize += lastMsg.Length;
Console.WriteLine(lastMsg);
var additionListToWrite = new Batch[BatchList.Count];
BatchList.CopyTo(additionListToWrite);
BatchList.Clear();
_currentsize = 0;
WriteToFile(additionListToWrite.ToList());
break;
}
}
}
var msg = JsonConvert.SerializeObject(message);
BatchList.Add(new Batch { Message = message });
_currentsize += msg.Length;
Console.WriteLine(msg);
}
}
private static void WriteToFile(List<Batch> listToWrite)
{
using (StreamWriter outFile = System.IO.File.CreateText(Path.Combine(#"C:\TEMP", $"{DateTime.Now.ToString("yyyyMMddHHmmssfff")}.json")))
{
outFile.Write(JsonConvert.SerializeObject(listToWrite));
}
}
static void Main(string[] args)
{
var producer = Task.Factory.StartNew(() => Producer());
var consumer = Task.Factory.StartNew(() => Consumer());
Console.Read();
}
}

1 out of N threads never joining

I have thread pool implementation where whenever I try to stop/join the pool there is always one random thread in the pool that will not stop (state == Running) when I call Stop() on the pool.
I cannot see why, I only have one lock, I notify whoever might be blocked waiting for Dequeue with Monitor.PulseAll in Stop. The debugger clearly shows most of them got the message, it is just always 1 out of N that is still running...
Here is a minimal implementation of the pool
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace MultiThreading
{
public class WorkerHub
{
private readonly object _listMutex = new object();
private readonly Queue<TaskWrapper> _taskQueue;
private readonly List<Thread> _threads;
private int _runCondition;
private readonly Dictionary<string, int> _statistics;
public WorkerHub(int count = 4)
{
_statistics = new Dictionary<string, int>();
_taskQueue = new Queue<TaskWrapper>();
_threads = new List<Thread>();
InitializeThreads(count);
}
private bool ShouldRun
{
get => Interlocked.CompareExchange(ref _runCondition, 1, 1) == 1;
set
{
if (value)
Interlocked.CompareExchange(ref _runCondition, 1, 0);
else
Interlocked.CompareExchange(ref _runCondition, 0, 1);
}
}
private void InitializeThreads(int count)
{
Action threadHandler = () =>
{
while (ShouldRun)
{
var wrapper = Dequeue();
if (wrapper != null)
{
wrapper.FunctionBinding.Invoke();
_statistics[Thread.CurrentThread.Name] += 1;
}
}
};
for (var i = 0; i < count; ++i)
{
var t = new Thread(() => { threadHandler.Invoke(); });
t.Name = $"WorkerHub Thread#{i}";
_statistics[t.Name] = 0;
_threads.Add(t);
}
}
public Task Enqueue(Action work)
{
var tcs = new TaskCompletionSource<bool>();
var wrapper = new TaskWrapper();
Action workInvoker = () =>
{
try
{
work.Invoke();
tcs.TrySetResult(true);
}
catch (Exception e)
{
tcs.TrySetException(e);
}
};
Action workCanceler = () => { tcs.TrySetCanceled(); };
wrapper.FunctionBinding = workInvoker;
wrapper.CancelBinding = workCanceler;
lock (_taskQueue)
{
_taskQueue.Enqueue(wrapper);
Monitor.PulseAll(_taskQueue);
}
return tcs.Task;
}
private TaskWrapper Dequeue()
{
lock (_listMutex)
{
while (_taskQueue.Count == 0)
{
if (!ShouldRun)
return null;
Monitor.Wait(_listMutex);
}
_taskQueue.TryDequeue(out var wrapper);
return wrapper;
}
}
public void Stop()
{
ShouldRun = false;
//Wake up whoever is waiting for dequeue
lock (_listMutex)
{
Monitor.PulseAll(_listMutex);
}
foreach (var thread in _threads)
{
thread.Join();
}
var sum = _statistics.Sum(pair => pair.Value) * 1.0;
foreach (var stat in _statistics)
{
Console.WriteLine($"{stat.Key} ran {stat.Value} functions, {stat.Value/sum * 100} percent of the total.");
}
}
public void Start()
{
ShouldRun = true;
foreach (var thread in _threads) thread.Start();
}
}
}
With a test run
public static async Task Main(string[] args)
{
var hub = new WorkerHub();
var tasks = Enumerable.Range(0, (int) 100).Select(x => hub.Enqueue(() => Sum(x)))
.ToArray();
var sw = new Stopwatch();
sw.Start();
hub.Start();
await Task.WhenAll(tasks);
hub.Stop();
sw.Start();
Console.WriteLine($"Work took: {sw.ElapsedMilliseconds}ms.");
}
public static int Sum(int n)
{
var sum = 0;
for (var i = 0; i <= n; ++i) sum += i;
Console.WriteLine($"Sum of numbers up to {n} is {sum}");
return sum;
}
Am I missing something fundamental? Please note this is not production code (phew) but stuff I am just missing around with so you might find more than 1 issue :)
I wasn't able to repro your MCVE at first because I ran it in a non-async Main()...
If you view the 'Threads' debug window at the call to hub.Stop(); you should see that execution has switched to one of your worker threads. This is why one worker thread does not respond.
I think its related to the problem described here.
Switching Enqueue(Action work) to use TaskCreationOptions.RunContinuationsAsynchronously should fix it:
var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);
[Edit]
Probably a better way to avoid the problem is to swap out the direct thread management to use tasks (this isn't a proper drop-in replacement for your current code, just want to show the idea):
public class TaskWorkerHub
{
ConcurrentQueue<Action> workQueue = new ConcurrentQueue<Action>();
int concurrentTasks;
CancellationTokenSource cancelSource;
List<Task> workers = new List<Task>();
private async Task Worker(CancellationToken cancelToken)
{
while (workQueue.TryDequeue(out var workTuple))
{
await Task.Run(workTuple, cancelToken);
}
}
public TaskWorkerHub(int concurrentTasks = 4)
{
this.concurrentTasks = concurrentTasks;
}
public void Enqueue(Action work) => workQueue.Enqueue(work);
public void Start()
{
cancelSource = new CancellationTokenSource();
for (int i = 0; i < concurrentTasks; i++)
{
workers.Add(Worker(cancelSource.Token));
}
}
public void Stop() => cancelSource.Cancel();
public Task WaitAsync() => Task.WhenAll(workers);
}

Read x number of lines of a file at a time C#

I want to read and process 10+ lines at a time for GB files, but haven't found a solution to spit out 10 lines until the end.
My last attempt was :
int n = 10;
foreach (var line in File.ReadLines("path")
.AsParallel().WithDegreeOfParallelism(n))
{
System.Console.WriteLine(line);
Thread.Sleep(1000);
}
I've seen solutions that use buffer sizes but I want to read in the entire row.
The Default behavour is to read all the Line in one shot, if you want to read less than that you need to dig a little deeper into how it reads them and get a StreamReader which will then let you control the reading process
using (StreamReader sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
Console.WriteLine(sr.ReadLine());
}
}
it also has a ReadLineAsync method that will return a task
if you contain these tasks in an ConcurrentBag you can very easily keep the processing running on 10 lines at a time.
var bag =new ConCurrentBag<Task>();
using (StreamReader sr = new StreamReader(path))
{
while(sr.Peek() >=0)
{
if(bag.Count < 10)
{
Task processing = sr.ReadLineAsync().ContinueWith( (read) => {
string s = read.Result;//EDIT Removed await to reflect Scots comment
//process line
});
bag.Add(processing);
}
else
{
Task.WaitAny(bag.ToArray())
//remove competed tasks from bag
}
}
}
note this code is for guidance only not to be used as is;
if all you want is the last ten lines then you can get that with the solution here
How to read a text file reversely with iterator in C#
This method would create "pages" of lines from your file.
public static IEnumerable<string[]> ReadFileAsLinesSets(string fileName, int setLen = 10)
{
using (var reader = new StreamReader(fileName))
while (!reader.EndOfStream)
{
var set = new List<string>();
for (var i = 0; i < setLen && !reader.EndOfStream; i++)
{
set.Add(reader.ReadLine());
}
yield return set.ToArray();
}
}
... More fun version...
class Example
{
static void Main(string[] args)
{
"YourFile.txt".ReadAsLines()
.AsPaged(10)
.Select(a=>a.ToArray()) //required or else you will get random data since "WrappedEnumerator" is not thread safe
.AsParallel()
.WithDegreeOfParallelism(10)
.ForAll(a =>
{
//Do your work here.
Console.WriteLine(a.Aggregate(new StringBuilder(),
(sb, v) => sb.AppendFormat("{0:000000} ", v),
sb => sb.ToString()));
});
}
}
public static class ToolsEx
{
public static IEnumerable<IEnumerable<T>> AsPaged<T>(this IEnumerable<T> items,
int pageLength = 10)
{
using (var enumerator = new WrappedEnumerator<T>(items.GetEnumerator()))
while (!enumerator.IsDone)
yield return enumerator.GetNextPage(pageLength);
}
public static IEnumerable<T> GetNextPage<T>(this IEnumerator<T> enumerator,
int pageLength = 10)
{
for (var i = 0; i < pageLength && enumerator.MoveNext(); i++)
yield return enumerator.Current;
}
public static IEnumerable<string> ReadAsLines(this string fileName)
{
using (var reader = new StreamReader(fileName))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
internal class WrappedEnumerator<T> : IEnumerator<T>
{
public WrappedEnumerator(IEnumerator<T> enumerator)
{
this.InnerEnumerator = enumerator;
this.IsDone = false;
}
public IEnumerator<T> InnerEnumerator { get; private set; }
public bool IsDone { get; private set; }
public T Current { get { return this.InnerEnumerator.Current; } }
object System.Collections.IEnumerator.Current { get { return this.Current; } }
public void Dispose()
{
this.InnerEnumerator.Dispose();
this.IsDone = true;
}
public bool MoveNext()
{
var next = this.InnerEnumerator.MoveNext();
this.IsDone = !next;
return next;
}
public void Reset()
{
this.IsDone = false;
this.InnerEnumerator.Reset();
}
}

not able to fetch links from pagination using watin dll

Hi i am collecting urls using watin framework. i want to traverse all the pages and collect the link and save it in one text file.I dont know how to add the pagination function.here is my code.
using System.Text;
using System.Threading.Tasks;
using WatiN.Core;
namespace magicbricks
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
IE ie = new IE();
ie.GoTo("http://www.99acres.com/property-in-chennai-ffid?search_type=QS&search_location=HP&lstAcn=HP_R&src=CLUSTER&isvoicesearch=N&keyword_suggest=chennai%20%28all%29%3B&fullSelectedSuggestions=chennai%20%28all%29&strEntityMap=W3sidHlwZSI6ImNpdHkifSx7IjEiOlsiY2hlbm5haSAoYWxsKSIsIkNJVFlfMzIsIFBSRUZFUkVOQ0VfUywgUkVTQ09NX1IiXX1d&texttypedtillsuggestion=chennai&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&suggestion=CITY_32%2C%20PREFERENCE_S%2C%20RESCOM_R");
foreach (var currLink in ie.Links)
{
if (currLink.Url.Contains("b"))
{
Console.WriteLine(currLink.Url);
}
}
Console.ReadLine();
}
}
}
any help will be appreciated.
Here is working solution for that. I changed a bit your code.
using System;
using WatiN.Core;
namespace magicbricks
{
static class Class1
{
private static WatiN.Core.Link _nextPageElement;
private static string _firstPartOfAddress = "";
private static string _lastPartOfAddress = "";
private static int _maxPageCounter = 0;
[STAThread]
static void Main(string[] args)
{
IE ie = SetUpBrowser();
EnterFirstWebpage(ie);
ie.WaitForComplete();
LookFoAllLinks(ie);
for (int i = 2; i < _maxPageCounter; i++)
{
Console.WriteLine("----------------------------Next Page {0}---------------------------", i);
Console.WriteLine(AssembleNextPageWebAddress(i));
EnterNextWebpageUrl(ie,AssembleNextPageWebAddress(i));
LookFoAllLinks(ie);
}
Console.ReadKey();
}
private static IE SetUpBrowser()
{
IE ie = new IE();
return ie;
}
private static void EnterFirstWebpage(IE ie)
{
ie.GoTo("http://www.99acres.com/property-in-chennai-ffid?search_type=QS&search_location=HP&lstAcn=HP_R&src=CLUSTER&isvoicesearch=N&keyword_suggest=chennai%20%28all%29%3B&fullSelectedSuggestions=chennai%20%28all%29&strEntityMap=W3sidHlwZSI6ImNpdHkifSx7IjEiOlsiY2hlbm5haSAoYWxsKSIsIkNJVFlfMzIsIFBSRUZFUkVOQ0VfUywgUkVTQ09NX1IiXX1d&texttypedtillsuggestion=chennai&refine_results=Y&Refine_Localities=Refine%20Localities&action=%2Fdo%2Fquicksearch%2Fsearch&suggestion=CITY_32%2C%20PREFERENCE_S%2C%20RESCOM_R");
}
private static void EnterNextWebpageUrl(IE ie,string url)
{
ie.GoTo(url);
ie.WaitForComplete();
}
private static void LookFoAllLinks(IE ie)
{
int currentpageCounter = 0;
var tmpUrl = string.Empty;
const string nextPageUrl = "http://www.99acres.com/property-in-chennai-ffid-page-";
foreach (var currLink in ie.Links)
{
if (currLink.Url.Contains("b"))
{
Console.WriteLine(currLink.Url);
try
{
if (currLink.Name.Contains("nextbutton"))
{
_nextPageElement = currLink;
}
}
catch (Exception ex)
{
}
try
{
if (currLink.GetAttributeValue("name").Contains("page"))
{
_firstPartOfAddress = currLink.Url.Substring(0, nextPageUrl.Length);
tmpUrl = currLink.Url.Remove(0,nextPageUrl.Length);
_lastPartOfAddress = tmpUrl.Substring(tmpUrl.IndexOf("?"));
tmpUrl = tmpUrl.Substring(0,tmpUrl.IndexOf("?"));
int.TryParse(tmpUrl, out currentpageCounter);
if (currentpageCounter > _maxPageCounter)
{
_maxPageCounter = currentpageCounter;
currentpageCounter = 0;
}
}
}
catch (Exception)
{
}
}
}
}
private static string AssembleNextPageWebAddress(int pageNumber)
{
return _firstPartOfAddress + pageNumber + _lastPartOfAddress;
}
}
}
Some explanation :
variable _maxPageCounter contains max numbers of pages to lookfor links.
We are getting this here :
if (currLink.GetAttributeValue("name").Contains("page"))
{
_firstPartOfAddress = currLink.Url.Substring(0, nextPageUrl.Length);
tmpUrl = currLink.Url.Remove(0,nextPageUrl.Length);
_lastPartOfAddress = tmpUrl.Substring(tmpUrl.IndexOf("?"));
tmpUrl = tmpUrl.Substring(0,tmpUrl.IndexOf("?"));
int.TryParse(tmpUrl, out currentpageCounter);
if (currentpageCounter > _maxPageCounter)
{
_maxPageCounter = currentpageCounter;
currentpageCounter = 0;
}
}
Later we are just looping through pages, by create next address.
private static string AssembleNextPageWebAddress(int pageNumber)
{
return _firstPartOfAddress + pageNumber + _lastPartOfAddress;
}
We could use here as well next button, and click it in loop.
I hope it was helpful.

Categories