Regex to get url from a text file

Regex to get url from a text file - c#

Currently i am working with c# project.In windows.I have a text file that contains the following text.
I want to grab this http://google.com/en/login
or
just google.com
Sometime the url may changes like below
SiteURL=http:// google.com/en/login
SiteURL=https:// google. com/en/login
SiteURL=http:// www.google. com/en/login
SiteURL=https:// www.google. com/en/login
Note: There are no spaces in url site not allowing me to post more than 2 links.
Even though i want to grab text after siteurl= or just google.com
Thanks
I have zero knowledge in regex.This is a small part what i need in my project.Thanks
The below text present in text file.While reading the text file i want to garb that url.Thank you very much
[Wordlist]
UserIndex=1
PassIndex=2
EmailIndex=0
543835182C9E9FFF099CD106D4253D3A=100
[Settings]
SiteURL=http://google.com/en/login
Timeout=20
WaitBot=0
ResolveHost=0
ComboFilter=0
ComboMode=0
EmailFilter=0
EmailMode=0
UsernameStart=6
UsernameEnd=8
InvalidChars=
AllowedChars=
Letters=0
Digits=0
Alpha=0
Email=0
LowerUpper=0
LetterDigit=0
SpeciaChar=0
PasswordStart=6
PasswordEnd=8
PasswordInvalidChars=
PasswordAllowedChars=
PasswordLetters=0
PasswordDigits=0
PasswordAlpha=0
PasswordEmail=0
PasswordLowerUpper=0
PasswordLetterDigit=0
PasswordSpeciaChar=0
ProxyActivate=10
ProxyRatio=4
ProxyCombo=0
WaitTime=1
BanWindowWidth=1
BanWindowRatio=10
BanWindowProxies=10
blnNoProxies=1
HTTPHeader=<ACTION> <FORM ACTION> <HTTP VERSION>|Accept: */*|User-Agent: <USER AGENT>|Host: <HOST>|Pragma: no-cache|Connection: keep-alive|
RequestMethod=2
Referer=0
POSTData=login=<USER>&password=<PASS>&referer=
[Fake]
AfterFP=1
blnSuccess=0
SuccessRetries=3
blnProcessErrors=0
blnCompleteNot=1
EnableConHits=0
ConHits=0
FollowRedirect=1
EnableConLength=0
ConLength=-1
SourceTags=1
UserField=0
HTTPFollow=1
blnForbToOK=0
ForbToOkLength=1000
blnBadOcrCode=0
BadOcrCodeRetries=3
[Keywords]
EnableHeaderSuccess=0
EnableHeaderBan=0
EnableHeaderFail=0
EnableHeaderRetry=0
HeaderSuccess=
HeaderBan=
HeaderFail=
HeaderRetry=
EnableGlobalSourceRetry=1
EnableSourceSuccess=1
EnableSourceBan=0
EnableSourceFail=1
EnableSourceRetry=0
SourceSuccess=>Logout
SourceBan=
SourceFail=Fail login
SourceRetry=
[Form]
IAParse=0
LoginPostData=
LoginMethod=1
LoginHeader=0
Action=http://google.com/en/login
Username=login
Password=password
Email=
AddData=referer=
CustomData=
NoIndex=
Cookie=identity=f03982a8f9c847e9a23cb818912f7a51; symfony=0el04cmspgogapkkt6k26uo3b4
IAction=-1
IUser=-1
IPass=-1
IEmail=-2
ICaptcha=-1
ReqReferer=
ReqCookie=
AjaxURL=
AjaxPOSTData=
AjaxData=
AjaxParsingCode=
RefData=
ParsingCode=
FormRedirectUrl=
RedPostData=
RedKeys=
DataDesc=Cracked BY ***************** Team = Your account is&Valid to
CaptureParsingCode=s: |<|#00|#00|0|#00|#00|0&o: | |#00|#00|0|#00|#00|0
RefreshSession=0
RefreshCookie=0
FormHeader=0
AjaxHeader=0
RedHeader=0
IAMethod=2
POSTMethod=2
RedMethod=1
ImageAfterAjax=0
blnBasic=0
FollowRedirectsOnIA=0
FollowRedirectsOnRed=1
[Ajax]
Variables=
PostElements2=
RedURL=
[OCR]
OCRMode=0
URLMode=0
ImageURLID=||
Captcha=
OCRKey=
RefreshCaptcha=0
blnContrast=0
blnBrightness=0
blnSaturation=0
blnThreshold=0
blnInvert=0
blnNoise=0
blnIsolate=0
blnResize=0
blnBorder=0
blnCharExtract=0
blnRemoveColors=0
blnStringFilter=0
blnLetter=1
blnDigits=1
blnBlur=0
blnReconstruct=0
blnLower=0
blnUpper=0
blnRemoveLines=0
blnMultiChar=0
blnCharTable=0
blnPalette=0
blnCharResize=0
blnCharSubExtraction=0
blnThreeImages=0
blnGif=0
blnCompute=0
blnBorderPre=0
Contrast=0
Brightness=0
Saturation=0
Threshold=0
Noise=1
Isolate=1
Resize=2
BorderLeft=0
BorderTop=0
BorderRight=0
BorderBottom=0
CharExtractMinBlack=0
CharExtractMaxBlack=1
CharExtractMinWidth=1
CharRotateMax=0
CharRotateSteps=5
MinLength=1
MaxLength=10
BlurRadius=1
CharExtractMaxWidth=33
CharWidthMinBlack=2
CharSpace=1
Range=0
InvertDensity=0
InvertLength=20
LineCurvatureMax=4
LineWidthMax=13
CharResize=1
CharHeight=13
GifStart=2
GifOffset=2
BorderLeftPre=0
BorderTopPre=0
BorderRightPre=0
BorderBottomPre=0
CharBorderH=5
CharBorderV=5
CharRotateBorder=5
CharExtractMinHeight=1
VerticalRejoin=30
CharExclude=
SpecialChars=
Colors=
Colors2=
Lines=Min Length: 2, Max Width: 5, Horizzontal
Language=eng

var pattern = #"(?<=SiteURL=).+";
string text = System.IO.File.ReadAllText(path); //your path needs to be added here (e.g. #"C:\Users\userx\Downloads\file.txt")
var match = System.Text.RegularExpressions.Regex.Match(text, pattern);
The link you're looking for is stored in the variable called "match".

Related

Adding hyperlinks in element notes Enterprise Architect

I am working on Enterprise Architect C# add-ins. I am trying to add hyperlink to another package in element notes through add-ins as shown below:
I found the code for adding hyperlink in element to package here : https://www.sparxsystems.com/forums/smf/index.php?topic=4068.0
and tried the following code:
EA.Package parentPkg = Session.Repository.GetPackageByID(currentPackage.ParentID);//target package
hyperlink = currentPackage.Elements.AddNew("$package://"+parentPkg.PackageGUID, "Text"); //adding hyperlink
hyperlink.Update();
hyperlink.Subtype = 19;
hyperlink.Update();
hyperlink.Notes = parentPkg.Name;
hyperlink.Update();
demoElement.Notes = "test for packages hyperlinks" + hyperlink; //demo element's notes must contain hyperlink to target package
mobjElement.Update();
It is not displayed as hyperlink here but as System.__ComObject.
Kindly help. Thanks in advance.

As Geert and Thomas Suggested if you just need to make hyperlink in notes just add a herf tag ti that word like below
This is a <a href="$element://{64162D99-026B-40b3-914C-2CC009943540}"><font
color="#0000ff"><u>Hyperlink</u></font> </a> Example
and the output in notes will be like
In API you can just add the link text in notes property of any class.
switch ( treeSelectedType )
{
case otElement :
{
// Code for when an element is selected
var theElement as EA.Element;
theElement = Repository.GetTreeSelectedObject();
theElement.Notes="This is a <font color=\"#0000ff\"><u>Hyperlink</u></font> Example";
theElement.Update();
theElement.Refresh();
break;
}

I tried (as Geert suggested) the following code snippet (sorry for the Perl):
my $e = $rep->getElementByGuid("{92EF2B52-B75E-454d-AD03-5BDC12256A36}");
$e->{notes} = "<font color=\"#0000ff\"><u>Link name</u></font>";
$e->Update();
Just replace the GUID and the display name and you have a hyperlink to a package. Note that the above string has some escape chars, so here's the raw text:
<font color="#0000ff"><u>Link name</u></font>

Using Regex to insert domain name into url

I am pulling in text from a database that is formatted like the sample below. I want to insert the domain name in front of every URL within this block of text.
<p>We recommend you check out the article
<a id="navitem" href="/article/why-apples-new-iphones-may-delight-and-worry-it-pros/" target="_top">
Why Apple's new iPhones may delight and worry IT pros</a> to learn more</p>
So with the example above in mind I want to insert http://www.mydomainname.com/ into the URL so it reads:
href="http://www.mydomainname.com/article/why-apples-new-iphones-may-delight-and-worry-it-pros/"
I figured I could use regex and replace href=" with href="http://www.mydomainname.com but this appears to not be working as I intended. Any suggestions or better methods I should be attempting?
var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(),
"^href=\"$", "href=\"https://www.mydomainname.com/");

You could use regex...
...but it's very much the wrong tool for the job.
Uri has some handy constructors/factory methods for just this purpose:
Uri ConvertHref(Uri sourcePageUri, string href)
{
//could really just be return new Uri(sourcePageUri, href);
//but TryCreate gives more options...
Uri newAbsUri;
if (Uri.TryCreate(sourcePageUri, href, out newAbsUri))
{
return newAbsUri;
}
throw new Exception();
}
so, say sourcePageUri is
var sourcePageUri = new Uri("https://somehost/some/page");
the output of our method with a few different values for href:
https://www.foo.com/woo/har => https://www.foo.com/woo/har
/woo/har => https://somehost/woo/har
woo/har => https://somehost/some/woo/har
...so it's the same interpretation as the browser makes. Perfect, no?

Try this code:
var content = Regex.Replace(DataBinder.Eval(e.Item.DataItem, "Content").ToString(),
"(href=[ \t]*\")\/", "$1https://www.mydomainname.com/", RegexOptions.Multiline);

Use html parser, like CsQuery.
var html = "your html text here";
var path = "http://www.mydomainname.com";
CQ dom = html;
CQ links = dom["a"];
foreach (var link in links)
link.SetAttribute("href", path + link["href"]);
html = dom.Html();

Pulling Data from Word Form

Using C#, I need to pull data from a word document. I have NetOffice for word installed in the project. The data is in two parts.
First, I need to pull data from the document settings.
Second, I need to pull the content of controls in the document. The content of the fields includes checkboxes, a date, and a few paragraphs. The input method is via controls, so there must be some way to interact with the controls via the api, but I don't know how to do that.
right now, I've got the following code to pull the flat text from the document:
private static string wordDocument2String(string file)
{
NetOffice.WordApi.Application wordApplication = new NetOffice.WordApi.Application();
NetOffice.WordApi.Document newDocument = wordApplication.Documents.Open(file);
string txt = newDocument.Content.Text;
wordApplication.Quit();
wordApplication.Dispose();
return txt;
}
So the question is: how do I pull the data from the controls from the document, and how do I pull the document settings (such as the title, author, etc. as seen from word), using either NetOffice, or some other package?

I did not bother to implement NetOffice, but the commands should mostly be the same (except probably for implementation and disposal methods).
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
string file = "C:\\Hello World.docx";
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(file);
// look for a specific type of Field (there are about 200 to choose from).
foreach (Field f in doc.Fields)
{
if (f.Type == WdFieldType.wdFieldDate)
{
//do something
}
}
// example of the myriad properties that could be associated with "document settings"
WdProtectionType protType = doc.ProtectionType;
if (protType.Equals(WdProtectionType.wdAllowOnlyComments))
{
//do something else
}
The MSDN reference on Word Interop is where you will find information on just about anything you need access to in a Word document.
UPDATE:
After reading your comment, here are a few document settings you can access:
string author = doc.BuiltInDocumentProperties("Author").Value;
string name = doc.Name; // this gives you the file name.
// not clear what you mean by "title"
As far as trying to understand what text you are getting from a "legacy control", I need more information as to exactly what kind of control you are extracting from. Try getting a name of the control/textbox/form/etc from within the document itself and then look up that property on the Google.
As a stab in the dark, here is an (incomplete) example of getting text from textboxes in the document:
List<string> textBoxText = new List<string>();
foreach (Microsoft.Office.Interop.Word.Shape s in doc.Shapes)
{
textBoxText.Add(s.TextFrame.TextRange.Text); //this could result in an error if there are shapes that don't contain text.
}
Another possibility is Content Controls, of which there are several types. They are often used to gather user input.
Here is some code to catch a rich text Content Control:
List<string> contentControlText = new List<string>();
foreach(ContentControl CC in doc.ContentControls)
{
if (CC.Type == WdContentControlType.wdContentControlRichText)
{
contentControlText.Add(CC.Range.Text);
}
}

Validate folder name in C#

I need to validate a folder name in c#.
I have tried the following regex :
^(.*?/|.*?\\)?([^\./|^\.\\]+)(?:\.([^\\]*)|)$
but it fails and I also tried using GetInvalidPathChars().
It fails when i try using P:\abc as a folder name i.e Driveletter:\foldername
Can anyone suggest why?

You could do that in this way (using System.IO.Path.InvalidPathChars constant):
bool IsValidFilename(string testName)
{
Regex containsABadCharacter = new Regex("[" + Regex.Escape(System.IO.Path.InvalidPathChars) + "]");
if (containsABadCharacter.IsMatch(testName) { return false; };
// other checks for UNC, drive-path format, etc
return true;
}
[edit]
If you want a regular expression that validates a folder path, then you could use this one:
Regex regex = new Regex("^([a-zA-Z]:)?(\\\\[^<>:\"/\\\\|?*]+)+\\\\?$");
[edit 2]
I've remembered one tricky thing that lets you check if the path is correct:
var invalidPathChars = Path.GetInvalidPathChars(path)
or (for files):
var invalidFileNameChars = Path.GetInvalidFileNameChars(fileName)

Validating a folder name correctly can be quite a mission. See my blog post Taking data binding, validation and MVVM to the next level - part 2.
Don't be fooled by the title, it's about validating file system paths, and it illustrates some of the complexities involved in using the methods provided in the .Net framework. While you may want to use a regex, it isn't the most reliable way to do the job.

this is regex you should use :
Regex regex = new Regex("^([a-zA-Z0-9][^*/><?\"|:]*)$");
if (!regex.IsMatch(txtFolderName.Text))
{
MessageBox.Show(this, "Folder fail", "info", MessageBoxButtons.OK, MessageBoxIcon.Information);
metrotxtFolderName.Focus();
}

How to get Google Plus's post data ( likes - shares - comments )?

Using C# , I want to read the Shares , Comments and Likes of a Google + post like this https://plus.google.com/107200121064812799857/posts/GkyGQPLi6KD

That post is an activity. This page includes infomation on how to get infomation about activities. This page gives some examples of using the API. This page has downloads for the Google API .NET library, which you can use to access the Google+ APIs, with XML documentation etc.
You'll need to use the API Console to get an API key and manage your API usage.
Also take a look at the API Explorer.
Here's a working example:
Referencing Google.Apis.dll and Google.Apis.Plus.v1.dll
PlusService plus = new PlusService();
plus.Key = "YOURAPIKEYGOESHERE";
ActivitiesResource ar = new ActivitiesResource(plus);
ActivitiesResource.Collection collection = new ActivitiesResource.Collection();
//107... is the poster's id
ActivitiesResource.ListRequest list = ar.List("107200121064812799857", collection);
ActivityFeed feed = list.Fetch();
//You'll obviously want to use a _much_ better way to get
// the activity id, but you aren't normally searching for a
// specific URL like this.
string activityKey = "";
foreach (var a in feed.Items)
if (a.Url == "https://plus.google.com/107200121064812799857/posts/GkyGQPLi6KD")
{
activityKey = a.Id;
break;
}
ActivitiesResource.GetRequest get = ar.Get(activityKey);
Activity act = get.Fetch();
Console.WriteLine("Title: "+act.Title);
Console.WriteLine("URL:"+act.Url);
Console.WriteLine("Published:"+act.Published);
Console.WriteLine("By:"+act.Actor.DisplayName);
Console.WriteLine("Annotation:"+act.Annotation);
Console.WriteLine("Content:"+act.Object.Content);
Console.WriteLine("Type:"+act.Object.ObjectType);
Console.WriteLine("# of +1s:"+act.Object.Plusoners.TotalItems);
Console.WriteLine("# of reshares:"+act.Object.Resharers.TotalItems);
Console.ReadLine();
Output:
Title: Wow Awesome creativity...!!!!!
URL:http://plus.google.com/107200121064812799857/posts/GkyGQPLi6KD
Published:2012-04-07T05:11:22.000Z
By:Funny Pictures & Videos
Annotation:
Content: Wow Awesome creativity...!!!!!
Type:note
# of +1s:210
# of reshares:158

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to get url from a text file - c#

Related

Adding hyperlinks in element notes Enterprise Architect

Using Regex to insert domain name into url

Pulling Data from Word Form

Validate folder name in C#

How to get Google Plus's post data ( likes - shares - comments )?

Categories

Resources