Check truly website address existence in all cases - c#

I came across a problem when I started working on a project where one of the features is checking whether web address exists or not. This problem is solved but since probably no human being writes addresses like this one:http://address.somethingit has to be changed by a code either by a regex pattern or simply by string.Substring method. But than with that even nonsences like dadadafafaf will change that into http://dadadafafaf which for a link in format like this my BT internet provider will create a new website which my method checking existence of the address approves. Is there any way to edit user's input address without losing control over it?
the methid:
protected bool ZkontrolujExistenciStranky(string link)
{
try
{
var request = WebRequest.Create(link) as HttpWebRequest;
request.Method = "GET";
using (var respond = (HttpWebResponse)request.GetResponse())
{
ZiskatData(respond);
return respond.StatusCode == HttpStatusCode.OK;
}
}
catch
{
return false;
}
}
here is an example:
http://postimg.org/image/a4gpkvtzl/

Related

ASP.NET Core - Identify if Request is Local [duplicate]

In the regular ASP.NET you could do this in a view to determine if the current request was from localhost:
HttpContext.Current.Request.IsLocal
But I can't find something similar in ASP.NET 6/Core/whatever it is meant to be called.
UPDATE: ASP.NET Core 2.0 has a method called Url.IsLocalUrl (see this Microsoft Docs).
I think this code will work, but I haven't been able to test it completely
var callingUrl = Request.Headers["Referer"].ToString();
var isLocal = Url.IsLocalUrl(callingUrl);
But see Will Dean's comment below about this approach:
Anyone thinking about using the 'updated' version which checks the Referrer header should bear in mind that headers are extremely easy to spoof, to a degree that doesn't apply to loopback IP addresses.
Original solution
I came across this looking for a solution to knowing if a request is local. Unfortunately ASP.NET version 1.1.0 does not have a IsLocal method on a connection. I found one solution on a web site called Strathweb but that is out of date too.
I have created my own IsLocal extension, and it seems to work, but I can't say I have tested it in all circumstances, but you are welcome to try it.
public static class IsLocalExtension
{
private const string NullIpAddress = "::1";
public static bool IsLocal(this HttpRequest req)
{
var connection = req.HttpContext.Connection;
if (connection.RemoteIpAddress.IsSet())
{
//We have a remote address set up
return connection.LocalIpAddress.IsSet()
//Is local is same as remote, then we are local
? connection.RemoteIpAddress.Equals(connection.LocalIpAddress)
//else we are remote if the remote IP address is not a loopback address
: IPAddress.IsLoopback(connection.RemoteIpAddress);
}
return true;
}
private static bool IsSet(this IPAddress address)
{
return address != null && address.ToString() != NullIpAddress;
}
}
You call it in a controller action from using the Request property, i.e.
public IActionResult YourAction()
{
var isLocal = Request.IsLocal();
//... your code here
}
I hope that helps someone.
At the time of writing HttpContext.Connection.IsLocal is now missing from .NET Core.
Other working solution checks only for a first loopback address (::1 or 127.0.0.1) which might not be adequate.
I find the solution below useful:
using Microsoft.AspNetCore.Http;
using System.Net;
namespace ApiHelpers.Filters
{
public static class HttpContextFilters
{
public static bool IsLocalRequest(HttpContext context)
{
if (context.Connection.RemoteIpAddress.Equals(context.Connection.LocalIpAddress))
{
return true;
}
if (IPAddress.IsLoopback(context.Connection.RemoteIpAddress))
{
return true;
}
return false;
}
}
}
And the example use case:
app.UseWhen(HttpContextFilters.IsLocalRequest, configuration => configuration.UseElmPage());
None of the above worked for me.
Url.IsLocalUrl works very different and I find it a bit useless:
For example, the following URLs are considered local:
/Views/Default/Index.html
~/Index.html
The following URLs are non-local:
../Index.html
http://www.contoso.com/
http://localhost/Index.html
HttpContext.Connection.IsLocal doesn't exist in .Net Core 2.2
Comparing ControllerContext.HttpContext.Connection.RemoteIpAddress and ControllerContext.HttpContext.Connection.LocalIpAddress also doesn't work in my test because I get "::1" for remote ip and "127.0.0.1" for local ip.
Finally, I used this piece:
IPAddress addr = System.Net.IPAddress.Parse( HttpContext.Connection.RemoteIpAddress.ToString() );
if (System.Net.IPAddress.IsLoopback(addr) )
{
//do something
}
Late to the party, but if I want to check IsLocal in razor views in .Net core 2.2+, I just do this:
#if (Context.Request.Host.Value.StartsWith("localhost"))
{
//do local stuff
}
UPDATE for ASP.NET Core 3.1
You can use this:
if (Request.Host.Host == "localhost") {// do something }
I would also mention that it may be useful to add the below clause to the end of your custom IsLocal() check
if (connection.RemoteIpAddress == null && connection.LocalIpAddress == null)
{
return true;
}
This would account for the scenario where the site is being ran using the Microsoft.AspNetCore.TestHost and the site is being ran entirely locally in memory without an actual TCP/IP connection.
now its
HttpContext.Connection.IsLocal
and if you need to check that outside of a controller then you take a dependency on IHttpContextAccessor to get access to it.
Update based on comment:
HttpContext is intrinsically available in Views
#if (Context.Connection.IsLocal)
{
}

Will this Cookie Checker reliably detect if user is accepting cookies?

I want my page to set cookies on users in order to track when and if they return, and will be keeping their unique visitor id in a database. I want to avoid creating cookie records for users which do not accept cookies (such as bots and crawlers), so I need a way to check if they are accepting cookies or not. I've devised the following code.
private bool CookiesAreEnabled()
{
bool result = false;
HttpCookie CookieChecker = new HttpCookie("CookieChecker");
CookieChecker.Value = "Do you see me?";
CookieChecker.Expires = DateTime.Now.AddSeconds(10.0d);
Response.Cookies.Set(CookieChecker);
CookieChecker = new HttpCookie("CookieChecker");
CookieChecker = Request.Cookies["CookieChecker"];
if (CookieChecker != null)
{
result = true;
CookieChecker = new HttpCookie("CookieChecker");
CookieChecker.Value = "";
CookieChecker.Expires = DateTime.Now;
Response.Cookies.Set(CookieChecker);
}
return result;
}
It seems to me that this should detect that cookies are disabled, but it doesn't! In my testing so far using Firefox with cookies turned off, the code reports that cookies are enabled! Am I barking up the wrong tree as far as detecting if cookies are enabled? Or am I making a newby-style mistake?
Not sure if overall this code is right, but Response.Cookies.Set(CookieChecker); will set on response that browser is yet to receive and process. when you immediately after call Request.Cookies["CookieChecker"]; this examines current request you're processing - one that browser generated before even receiving Set-Cookie request.
At a minimum you need to let browser process response, then examine the next request to see if the cookie is there.
if you want to use that approch the only way is to use 2 request: the first you set a cookie, the second you read it. You cannot do it with only one request.
Otherwise I think you can achieve it with javascript (this is a pseudo code):
/*first you check if there is already a cookie with the identifier*/
if(document.cookie.indexOf('cookiewithuniqueidentifier') == -1) {
document.cookie = 'testcookie';
cookieEnabled = (document.cookie.indexOf('testcookie') != -1) ? true : false;
if (cookieEnabled) {
/*ajax request to the server for requesting an unique identifier*/
/*save a cookie with that identifier*/
document.cookie='cookiewithuniqueidentifier=936DA01F-9ABD-4d9d-80C7-02AF85C822A8'
}
}

Sitecore custom 404 handler in production

I picked up the following code from Stackoverflow->a blog re handling custom 404 in Sitecore (which acutally does a 302 redirect to 404 page with status 200 which gets picked up by google as soft 404).
While this works totally fine in our local test servers, the moment we drop it in production the site goes haywire and takes AGES e.g. 8-9 minutes to load and stuff.
public class ExecuteRequest : Sitecore.Pipelines.HttpRequest.ExecuteRequest
{
protected override void RedirectOnItemNotFound(string url)
{
var context = System.Web.HttpContext.Current;
try
{
// Request the NotFound page
var domain = context.Request.Url.GetComponents(
UriComponents.Scheme | UriComponents.Host,
UriFormat.Unescaped);
var content = WebUtil.ExecuteWebPage(
string.Concat(domain, url));
// The line below is required for IIS 7.5 hosted
// sites or else IIS is gonna display default 404 page
context.Response.TrySkipIisCustomErrors = true;
context.Response.StatusCode = 404;
context.Response.Write(content);
}
catch (Exception ex)
{
Log.Error(string.Format("Falling back to default redirection behavior. Reason for error {0}", ex), ex);
// Fall back to default behavior on exceptions
base.RedirectOnItemNotFound(url);
}
context.Response.End();
}
}
P.S: I then replaced ExecuteRequest with my custom one in web.config.
If you have experienced similar thing or know of any issue re this please do shed some light.
Thanks in advance
There is a setting in Sitecore, with which you can get rid of the 302 redirect:
<setting name="RequestErrors.UseServerSideRedirect" value="true" />
With this settings, the url stays the same and the status code is 404. If you want to have some additional logic (like showing a Sitecore item as error page), there is a Shared Source module called Error Manager on the Sitecore Market Place.
Hope that helps.
Check if the server is able to access the hostname of your website.
Servers often do not have access to a DNS and therefore are unable to resolve hostnames. In order for your 404 handler to work, the application needs to be able to access its own hostname to request the 404 page.
To be sure this works, edit the hosts file of the server and add an entry for your hostname there, pointing it to 127.0.0.1
You can resolve it with creating new resolver. It is good solution when you want to give to an user error page in right language. But there some differences in IIS 7.0 and 7.5.
Add processor to your sitecore configuration:
<processor type="Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel"/>
<processor type="Project.Error404Resolver, Project" />
Processor resolving it:
For IIS 7.0:
public class Error404Resolver : Sitecore.Pipelines.HttpRequest.HttpRequestProcessor
{
public override void Process(Sitecore.Pipelines.HttpRequest.HttpRequestArgs args)
{
if(Sitecore.Context.Item == null && !args.Context.Request.Url.AbsolutePath.StartsWith("/sitecore")
{
args.Context.Response.Clear();
SiteContext site = Sitecore.Context.Site;
if(site != null)
{
Item item404Page = Sitecore.Context.Database.GetItem(site.RootPath + "website/error/404");
if(item404Page != null)
{
Sitecore.Context.Item = item404Page;
args.Context.Response.StatusCode = (int) System.Net.HttpStatusCode.NotFound;
}
}
}
}
}
For IIS 7.5:
public class Error404Resolver : Sitecore.Pipelines.HttpRequest.HttpRequestProcessor
{
public override void Process(Sitecore.Pipelines.HttpRequest.HttpRequestArgs args)
{
if(Sitecore.Context.Item == null && !args.Context.Request.Url.AbsolutePath.StartsWith("/sitecore")
{
args.Context.Response.Clear();
SiteContext site = Sitecore.Context.Site;
if(site != null)
{
Item item404Page = Sitecore.Context.Database.GetItem(site.RootPath + "website/error/404");
if(item404Page != null)
{
WebClient webClient = new WebClient();
webClient.Encoding = args.Context.Request.ContentEncoding;
webClient.Headers.Add("User-Agent", args.Context.Request.UserAgent);
string page = webClient.DownloadString(LinkManager.GetItemUrl(item404Page));
args.Context.Response.StatusCode = (int) System.Net.HttpStatusCode.NotFound;
args.Context.Response.Write(page);
args.Context.Response.TrySkipIisCustomErrors = true;
args.Context.Response.End();
}
}
}
}
}
Whit this you will render error page in current page without redirect and returns to a browser code 404.
I have the same issue at a customer I currently work at (looks like the code was pasted) and actually the reason is pretty obvious: If you execute this call with a url that is not registered in the Sitecore sites config (but accessible via IIS), you will also run through this code. Unfortunately, the WebUtil.ExecuteWebPage call is executed with the wrong url as well, hence you end up stuck in a loop.
Actually you should see a lot of these messages in your log: Falling back to default redirection behavior. Reason for error {0}, probably with timeouts.
If you really want to use your custom handler, you should check if you are in the right site context before calling WebUtil.ExecuteWebPage.

Awesomnium Post Parameters

currently i am working with Awsomnium 1.7 in the C# environment.
I'm just using the Core and trying to define custom post parameters.
Now, i googled a lot and i even posted at the awsomnium forums, but there was no answer.
I understand the concept, but the recent changes just dropped the suggested mechanic and examples.
What i found:
http://support.awesomium.com/kb/general-use/how-do-i-send-form-values-post-data
The problem with this is, that the WebView Class does not contain "OnResourceRequest" Event anymore.
So far, i have implemented the IResourceInterceptor and have the "OnRequest"-Function overwritten
public ResourceResponse OnRequest(ResourceRequest request)
is the signature, but i have no chance to reach in there in order to add request headers.
Anyone here any idea? I tried to look in the documentation, but i didn't find anything on that.....
You need to attach your IResourceInterceptor to WebCore, not WebView. Here's a working example:
Resource interceptor:
public class CustomResourceInterceptor : ResourceInterceptor
{
protected override ResourceResponse OnRequest(ResourceRequest request)
{
request.Method = "POST";
var bytes = "Appending some text to the request";
request.AppendUploadBytes(bytes, (uint) bytes.Length);
request.AppendExtraHeader("custom-header", "this is a custom header");
return null;
}
}
Main application:
public MainWindow()
{
WebCore.Started += WebCoreOnStarted;
InitializeComponent();
}
private void WebCoreOnStarted(object sender, CoreStartEventArgs coreStartEventArgs)
{
var interceptor = new CustomResourceInterceptor();
WebCore.ResourceInterceptor = interceptor;
//webView is a WebControl on my UI, but you should be able to create your own WebView off WebCore
webView.Source = new Uri("http://www.google.com");
}
HotN's answer above is good; in fact, it's what I based my answer on. However, I spent a week searching for this information and putting together something that will work. (The answer above has a couple of issues which, at the very least, make it unworkable with v1.7 of Awesomium.) What I was looking for was something that would work right out of the box.
And here is that solution. It needs improvement, but it suits my needs at the moment. I hope this helps someone else.
// CRI.CustomResourceInterceptor
//
// Author: Garison E Piatt
// Contact: {removed}
// Created: 11/17/14
// Version: 1.0.0
//
// Apparently, when Awesomium was first created, the programmers did not understand that someone would
// eventually want to post data from the application. So they made it incredibly difficult to upload
// POST parameters to the remote web site. We have to jump through hoops to get that done.
//
// This module provides that hoop-jumping in a simple-to-understand fashion. We hope. It overrides
// the current resource interceptor (if any), replacing both the OnRequest and OnFilterNavigation
// methods (we aren't using the latter yet).
//
// It also provides settable parameters. Once this module is attached to the WebCore, it is *always*
// attached; therefore, we can simply change the parameters before posting to the web site.
//
// File uploads are currently unhandled, and, once handled, will probably only upload one file. We
// will deal with that issue later.
//
// To incoroprate this into your application, follow these steps:
// 1. Add this file to your project. You know how to do that.
// 2. Edit your MainWindow.cs file.
// a. At the top, add:
// using CRI;
// b. inside the main class declaration, near the top, add:
// private CustomResourceInterceptor cri;
// c. In the MainWindow method, add:
// WebCore.Started += OnWebCoreOnStarted;
// cri = new CustomResourceInterceptor();
// and (set *before* you set the Source value for the Web Control):
// cri.Enabled = true;
// cri.Parameters = String.Format("login={0}&password={1}", login, pw);
// (Choose your own parameters, but format them like a GET query.)
// d. Add the following method:
// private void OnWebCoreOnStarted(object sender, CoreStartEventArgs coreStartEventArgs) {
// WebCore.ResourceInterceptor = cri;
// }
// 3. Compile your application. It should work.
using System;
using System.Runtime.InteropServices;
using System.Text;
using Awesomium.Core;
using Awesomium.Windows.Controls;
namespace CRI {
//* CustomResourceInterceptor
// This object replaces the standard Resource Interceptor (if any; we still don't know) with something
// that allows posting data to the remote web site. It overrides both the OnRequest and OnFilterNavigation
// methods. Public variables allow for run-time configuration.
public class CustomResourceInterceptor : IResourceInterceptor {
// Since the default interceptor remains overridden for the remainder of the session, we need to disable
// the methods herein unless we are actually using them. Note that both methods are disabled by default.
public bool RequestEnabled = false;
public bool FilterEnabled = false;
// These are the parameters we send to the remote site. They are empty by default; another safeguard
// against sending POST data unnecessarily. Currently, both values allow for only one string. POST
// variables can be combined (by the caller) into one string, but this limits us to only one file
// upload at a time. Someday, we will have to fix that. And make it backward-compatible.
public String Parameters = null;
public String FilePath = null;
/** OnRequest
** This ovverrides the default OnRequest method of the standard resource interceptor. It receives
** the resource request object as a parameter.
**
** It first checks whether or not it is enabled, and returns NULL if not. Next it sees if any
** parameters are defined. If so, it converst them to a byte stream and appends them to the request.
** Currently, files are not handled, but we hope to add that someday.
*/
public ResourceResponse OnRequest(ResourceRequest request) {
// We do nothing at all if we aren't enabled. This is a stopgap that prevents us from sending
// POST data with every request.
if (RequestEnabled == false) return null;
// If the Parameters are defined, convert them to a byte stream and append them to the request.
if (Parameters != null) {
var str = Encoding.Default.GetBytes(Parameters);
var bytes = Encoding.UTF8.GetString(str);
request.AppendUploadBytes(bytes, (uint)bytes.Length);
}
// If either the parameters or file path are defined, this is a POST request. Someday, we'll
// figure out how to get Awesomium to understand Multipart Form data.
if (Parameters != null || FilePath != null) {
request.Method = "POST";
request.AppendExtraHeader("Content-Type", "application/x-www-form-urlencoded"); //"multipart/form-data");
}
// Once the data has been appended to the page request, we need to disable this process. Otherwise,
// it will keep adding the data to every request, including those that come from the web site.
RequestEnabled = false;
Parameters = null;
FilePath = null;
return null;
}
/** OnFilterNavigation
** Not currently used, but needed to keep VisualStudio happy.
*/
public bool OnFilterNavigation(NavigationRequest request) {
return false;
}
}
}

C# Parallel ForEach hold value

I am working on an app that searches for email addresses in Google search results' URLs. The problem is it needs to return the value it found in each page + the URL in which it found the email, to a datagridview with 2 columns: Email and URL.
I am using Parallel.ForEach for this one but of course it returns random URLs and not the ones it really found the email on.
public static string htmlcon; //htmlsource
public static List<string> emailList = new List<string>();
public static string Get(string url, bool proxy)
{
htmlcon = "";
try
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
if (proxy)
req.Proxy = new WebProxy(proxyIP + ":" + proxyPort);
req.Method = "GET";
req.UserAgent = Settings1.Default.UserAgent;
if (Settings1.Default.EnableCookies == true)
{
CookieContainer cont = new CookieContainer();
req.CookieContainer = cont;
}
WebResponse resp = req.GetResponse();
StreamReader SR = new StreamReader(resp.GetResponseStream());
htmlcon = SR.ReadToEnd();
Thread.Sleep(400);
resp.Close();
SR.Close();
}
catch (Exception)
{
Thread.Sleep(500);
}
return htmlcon;
}
private void copyMails(string url)
{
string emailPat = #"(\b[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)";
MatchCollection mailcol = Regex.Matches(htmlcon, emailPat, RegexOptions.Singleline);
foreach (Match mailMatch in mailcol)
{
email = mailMatch.Groups[1].Value;
if (!emailList.Contains(email))
{
emailList.Add(email);
Action dgeins = () => mailDataGrid.Rows.Insert(0, email, url);
mailDataGrid.BeginInvoke(dgeins);
}
}
}
private void SEbgWorker_DoWork(object sender, DoWorkEventArgs e)
{
//ALOT OF IRRELEVAMT STUFF BEING RUN
Parallel.ForEach(allSElist.OfType<string>(), (s) =>
{
//Get URL
Get(s, Settings1.Default.Proxyset);
//match mails 1st page
copyMails(s);
});
}
so this is it: I execute a Get request(where "s" is the URL from the list) and then execute copyMails(s) from the URL's html source. It uses regex to copy the emails.
If I do it without parallel it returns the correct URL for each email in the datagridview. How can I do this parallel an still get the correct match in the datagridview?
Thanks
You would be better off using PLINQ's Where to filter (pseudo code):
var results = from i in input.AsParallel()
let u = get the URL from i
let d = get the data from u
let v = try get the value from d
where v is found
select new {
Url = u,
Value = v
};
Underneath the AsParallel means that TPL's implementation of LINQ operators (Select, Where, ...) is used.
UPDATE: Now with more information
First there are a number of issues in your code:
The variable htmlcon is static but used directly by multiple threads. This could well be your underlying problem. Consider just two input values. The first Get completes setting htmlcon, before that thread's call to copyMails starts the second thread's Get completes its HTML GET and writes to htmlcon. With `email
The list emailList is also accessed without locking by multiple threads. Most collection types in .NET (and any other programming platform) are not thread safe, you need to limit access to a single thread at a time.
You are mixing up various activities in each of your methods. Consider applying the singe responsibility principle.
Thread.Sleep to handle an exception?! If you can't handle an exception (ie. resolve the condition) then do nothing. In this case if the action throws then the Parallel.Foreach will throw: that'll do until you define how to handle the HTML GET failing.
Three suggestions:
In my experience clean code (to an obsessive degree) makes things easier: the details of the format
don't matter (one true brace style is better, but consistency is the key). Just going through
and cleaning up the formatting showed up issues #1 and #2.
Good naming. Don't abbreviate anything used over more than a few lines of code unless that is a
significant term for the domain. Eg. s for the action parameter in the parallel loop is really a url
so call it that. This kind of thing immediately makes the code easier to follow.
Think about that regex for emails: there are many valid emails that will not match (eg. use of + to provide multiple logical addresses: exmaple+one#gamil.com will be delivered to example#gmail.com and can then be used for local rules). Also an apostrophe ("'") is a valid character (and known people frustrated by web sites that refused their addresses by getting this wrong).
Second: A relatively direct clean up:
public static string Get(string url, bool proxy) {
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
if (proxy) {
req.Proxy = new WebProxy(proxyIP + ":" + proxyPort);
}
req.Method = "GET";
req.UserAgent = Settings1.Default.UserAgent;
if (Settings1.Default.EnableCookies == true) {
CookieContainer cont = new CookieContainer();
req.CookieContainer = cont;
}
using (WebResponse resp = req.GetResponse())
using (StreamReader SR = new StreamReader(resp.GetResponseStream())) {
return SR.ReadToEnd();
}
}
private static Regex emailMatcher = new Regex(#"(\b[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)", RegexOptions.Singleline);
private static string[] ExtractEmails(string htmlContent) {
return emailMatcher.Matches(htmlContent).OfType<Match>
.Select(m => m.Groups[1].Value)
.Distinct()
.ToArray();
}
private void SEbgWorker_DoWork(object sender, DoWorkEventArgs e) {
Parallel.ForEach(allSElist.OfType<string>(), url => {
var htmlContent = Get(url, Settings1.Default.Proxyset);
var emails = ExtractEmails(htmlContent);
foreach (var email in emails) {
Action dgeins = () => mailDataGrid.Rows.Insert(0, email, url);
mailDataGrid.BeginInvoke(dgeins);
}
}
Here I have:
Made use of using statements to automate the cleanup of resources.
Eliminated all mutable shared state.
Regex is explicitly documented to have thread safe instance methods. So I only need a single instance.
Removed noise: no need to pass the URL to ExtractEmails because the extraction doesn't use the URL.
Get now only performs the HTML get, ExtreactEMail just the extraction
Third: The above will block threads on the slowest operation: the HTML GET.
The real concurrency benefit would be to replace HttpWebRequest.GetResponse and reading the response stream with their asynchronous equivalents.
Using Task would be the answer in .NET 4, but you need to directly work with Stream and encoding yourself because StreamReader doesn't provide any BeginABC/EndABC method pairs. But .NET 4.5 is almost here, so apply some async/await:
Nothing to do in ExtractEMails.
Get is now asynchronous, blocking in neither the HTTP GET or reading the result.
SEbgWorker_DoWork uses Tasks directly to avoid mixing too many different ways to work with TPL. Since Get returns a Task<string> can simple continue (when it hasn't failed – unless you specify otherwise ContinueWith will only continue if the previous task has completed successfully):
This should work in .NET 4.5, but without a set of valid URLs for which this will work I cannot test.
public static async Task<string> Get(string url, bool proxy) {
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
if (proxy) {
req.Proxy = new WebProxy(proxyIP + ":" + proxyPort);
}
req.Method = "GET";
req.UserAgent = Settings1.Default.UserAgent;
if (Settings1.Default.EnableCookies == true) {
CookieContainer cont = new CookieContainer();
req.CookieContainer = cont;
}
using (WebResponse resp = await req.GetResponseAsync())
using (StreamReader SR = new StreamReader(resp.GetResponseStream())) {
return await SR.ReadToEndAsync();
}
}
private static Regex emailMatcher = new Regex(#"(\b[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b)", RegexOptions.Singleline);
private static string[] ExtractEmails(string htmlContent) {
return emailMatcher.Matches(htmlContent).OfType<Match>
.Select(m => m.Groups[1].Value)
.Distinct()
.ToArray();
}
private void SEbgWorker_DoWork(object sender, DoWorkEventArgs e) {
tasks = allSElist.OfType<string>()
.Select(url => {
return Get(url, Settings1.Default.Proxyset)
.ContinueWith(htmlContentTask => {
// No TaskContinuationOptions, so know always OK here
var htmlContent = htmlContentTask.Result;
var emails = ExtractEmails(htmlContent);
foreach (var email in emails) {
// No InvokeAsync on WinForms, so do this the old way.
Action dgeins = () => mailDataGrid.Rows.Insert(0, email, url);
mailDataGrid.BeginInvoke(dgeins);
}
});
});
tasks.WaitAll();
}
public static string htmlcon; //htmlsource
public static List emailList = new List();
Problem is because these members htmlcon and emailList are shared resource among thread and among iterations. Each your iteration in Parallel.ForEach is executed parallel. Thats why you have strange behaviour.
How to solve problem:
Modify your code and try to implement it without static variables or shared state.
As an option is change from Parallel.ForEach to TPL Task chaining, when you make this change then result of one parallel operation will be input data for other and it's as an options among many how to modify code to avoid shared state.
Use locking or concurrent collections. Your htmlcon variable could be made volatile but with list you should yous lock's or concurrent collections.
Better way is modify your code to avoid shared state, and how to do that are many options based on your implementation, not only task chaining.

Categories