Search raw text from url, using keyword from textbox - c#

Me and my buddy Xylophone have been at this for hours and cant figure this out, any help would be appreciated. I'm basically trying to read all the text from that URL and search for a keyword.
if (comboBoxEdit1.Text == "Hello")
{
label2.Text = "Current Status: Searching...";
this.dataGridView3.ScrollBars = ScrollBars.None;
this.dataGridView3.MouseWheel += new MouseEventHandler(mousewheel);
dataGridView3.Rows.Clear();
string line;
int row = 0;
List<String> LinesFound = new List<string>();
StreamReader file = new StreamReader("https://pastebin.com/raw/fWxKdRjN");
while ((line = file.ReadLine()) != null)
{
if (line.Contains(textEdit1.Text))
{
string[] Columns = line.Split(':');
dataGridView3.Rows.Add(line);
for (int i = 0; i < Columns.Length; i++)
{
dataGridView3[i, row].Value = Columns[i];
}
row++;
label2.Text = "Current Status: " + dataGridView3.Rows.Count + " Matche(s) Found";
}
else if (dataGridView3.RowCount == 0)
{
label2.Text = "Current Status: No Matche(s) Found";
}
}
}

You are doing it all wrong, if you want to read and pars the html content of the web page, you need to fetch the page using httpClient, or better take look at this library https://html-agility-pack.net/

We can use regular expression to check if 'raw' exist in the URL.
Regex.Matches() function will return an array with all occurrence of the match.
We can then use count property to find the no of occurence.
Regular expression to match raw in a url:(raw)
Below is the working code snippet:
public static void Main()
{
string pattern = #"(raw)";
Regex rgx = new Regex(pattern);
string url = "https://pastebin.com/raw/fWxKdRjN";
if (rgx.Matches(url).Count>0){
Console.WriteLine(Current Status: " + rgx.Matches(url).Count + " Matche(s) Found");
}
else {
Console.WriteLine("Current Status: No Matche(s) Found");
}
}

Related

How can I find a phrase anywhere in a String Array?

I need to see if any phrase, such as "duckbilled platypus" appears in a string array.
In the case I'm testing, the phrase does exist in the string list, as shown here:
Yet, when I look for that phrase, as shown here:
...it fails to find it. I never get past the "if (found)" gauntlet in the code below.
Here is the code that I'm using to try to traverse through the contents of one doc to see if any phrase (two words or more) are found in both documents:
private void FindAndStorePhrasesFoundInBothDocs()
{
string[] doc1StrArray;
string[] doc2StrArray;
slPhrasesFoundInBothDocs = new List<string>();
slAllDoc1Words = new List<string>();
int iCountOfWordsInDoc1 = 0;
int iSearchStartIndex = 0;
int iSearchEndIndex = 1;
string sDoc1PhraseToSearchForInDoc2;
string sFoundPhrase;
bool found;
int iLastWordIndexReached = iSearchEndIndex;
try
{
doc1StrArray = File.ReadAllLines(sDoc1Path, Encoding.UTF8);
doc2StrArray = File.ReadAllLines(sDoc2Path, Encoding.UTF8);
foreach (string line in doc1StrArray)
{
string[] subLines = line.Split();
foreach (string whirred in subLines)
{
if (String.IsNullOrEmpty(whirred)) continue;
slAllDoc1Words.Add(whirred);
}
}
iCountOfWordsInDoc1 = slAllDoc1Words.Count();
sDoc1PhraseToSearchForInDoc2 = slAllDoc1Words[iSearchStartIndex] + ' ' + slAllDoc1Words[iSearchEndIndex];
while (iLastWordIndexReached < iCountOfWordsInDoc1 - 1)
{
sFoundPhrase = string.Empty;
// Search for the phrase from doc1 in doc2;
found = doc2StrArray.Contains(sDoc1PhraseToSearchForInDoc2);
if (found)
{
sFoundPhrase = sDoc1PhraseToSearchForInDoc2;
iSearchEndIndex++;
sDoc1PhraseToSearchForInDoc2 = sDoc1PhraseToSearchForInDoc2 + ' ' + slAllDoc1Words[iSearchEndIndex];
}
else //if not found, inc vals of BOTH int args and, if sFoundPhrase not null, assign to sDoc1PhraseToSearchForInDoc2 again.
{
iSearchStartIndex = iSearchEndIndex;
iSearchEndIndex = iSearchStartIndex + 1;
if (!string.IsNullOrWhiteSpace(sFoundPhrase)) // add the previous found phrase if there was one
{
slPhrasesFoundInBothDocs.Add(sFoundPhrase);
}
sDoc1PhraseToSearchForInDoc2 = slAllDoc1Words[iSearchStartIndex] + ' ' + slAllDoc1Words[iSearchEndIndex];
} // if/else
iLastWordIndexReached = iSearchEndIndex;
} // while
} // try
catch (Exception ex)
{
MessageBox.Show("FindAndStorePhrasesFoundInBothDocs(); iSearchStartIndex = " + iSearchStartIndex.ToString() + "iSearchEndIndex = " + iSearchEndIndex.ToString() + " iLastWordIndexReached = " + iLastWordIndexReached.ToString() + " " + ex.Message);
}
}
doc2StrArray does contain the phrase sought, so why does doc2StrArray.Contains(sDoc1PhraseToSearchForInDoc2) fail?
This should do what you want:
found = Array.FindAll(doc2StrArray, s => s.Contains(sDoc1PhraseToSearchForInDoc2));
In List<T>, Contains() looking for an T, Here in your code to found be true must have all the text in particular index (NOT part of it).
Try this
var _list = doc2StrArray.ToList();
var found = _list.FirstOrDefault( w => w.Contains( sDoc1PhraseToSearchForInDoc2 ) ) != null;

C# Read a row in a file

Given text file which contains the registration data like a database:
[ID] [Uname] [PW] [Email]
0 Aron asd asd#mail.com
1 Aron2 asdd asd#mail.com
I have the username and the password input.
How would i read only that line in this text file where my uname.Text and password.Text are given?
I agree with all the comments above. With the hypothesis that the file is not huge, you can simply load it all in memory and work on it:
//Load your files in a list of strings
IList<string> lines = File.ReadLines("\path\to\your\file.txt");
//Filter the list with only the pattern you want
var pattern = username + "[ ]{1,}" + password;
Regex regex = new Regex(pattern);
IList<string> results = lines.Where(x => regex.IsMatch(x)).ToList();
Here's a .NET fiddler that shows this.
If anyone have this problem too, this is my solve:
int check=0;
if (txt_uname.Text != "")
{
check = 0;
System.IO.StreamReader file = new System.IO.StreamReader(path);
string[] columnnames = file.ReadLine().Split('\t');
string newline;
while ((newline=file.ReadLine()) != null)
{
string[] values = newline.Split('\t');
if (check== 0){
for (int i = 0; i < values.Length; i++)
{
if (txt_uname.Text == values[i] && txt_pw.Text == values[i + 1])
{
Console.WriteLine("User found");
check= 1;
break;
}
else
{
Console.WriteLine("User isn't exists");
}
}
}
}
Try this:
var username = "Aron2";
var password = "asdd";
List<string> matchedValues; // Contains field values of matched line.
var lines = File.ReadLines("input.txt");
foreach (string l in lines)
{
var values = l.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).ToList();
if (values.Contains(username) && values.Contains(password))
{
matchedValues = values;
break; // Matching line found. No need to loop further.
}
}

replacing text in a text file with \r\n

Currently I am building an agenda with extra options.
for testing purposes I store the data in a simple .txt file
(after that it will be connected to the agenda of a virtual assistant.)
To change or delete text from this .txt file I have a problem.
Although the part of the content that needs to be replaced and the search string are exactly the same it doesn't replace the text in content.
code:
Change method
public override void Change(List<object> oldData, List<object> newData)
{
int index = -1;
for (int i = 0; i < agenda.Count; i++)
{
if(agenda[i].GetType() == "Task")
{
Task t = (Task)agenda[i];
if(t.remarks == oldData[0].ToString() && t.datetime == (DateTime)oldData[1] && t.reminders == oldData[2])
{
index = i;
break;
}
}
}
string search = "Task\r\nTo do: " + oldData[0].ToString() + "\r\nDateTime: " + (DateTime)oldData[1] + "\r\n";
reminders = (Dictionary<DateTime, bool>) oldData[2];
if(reminders.Count != 0)
{
search += "Reminders\r\n";
foreach (KeyValuePair<DateTime, bool> rem in reminders)
{
if (rem.Value)
search += "speak " + rem.Key + "\r\n";
else
search += rem.Key + "\r\n";
}
}
// get new data
string newRemarks = (string)newData[0];
DateTime newDateTime = (DateTime)newData[1];
Dictionary<DateTime, bool> newReminders = (Dictionary<DateTime, bool>)newData[2];
string replace = "Task\r\nTo do: " + newRemarks + "\r\nDateTime: " + newDateTime + "\r\n";
if(newReminders.Count != 0)
{
replace += "Reminders\r\n";
foreach (KeyValuePair<DateTime, bool> rem in newReminders)
{
if (rem.Value)
replace += "speak " + rem.Key + "\r\n";
else
replace += rem.Key + "\r\n";
}
}
Replace(search, replace);
if (index != -1)
{
remarks = newRemarks;
datetime = newDateTime;
reminders = newReminders;
agenda[index] = this;
}
}
replace method
private void Replace(string search, string replace)
{
StreamReader reader = new StreamReader(path);
string content = reader.ReadToEnd();
reader.Close();
content = Regex.Replace(content, search, replace);
content.Trim();
StreamWriter writer = new StreamWriter(path);
writer.Write(content);
writer.Close();
}
When running in debug I get the correct info:
content "-- agenda --\r\n\r\nTask\r\nTo do: test\r\nDateTime: 16-4-2012 15:00:00\r\nReminders:\r\nspeak 16-4-2012 13:00:00\r\n16-4-2012 13:30:00\r\n\r\nTask\r\nTo do: testing\r\nDateTime: 16-4-2012 9:00:00\r\nReminders:\r\nspeak 16-4-2012 8:00:00\r\n\r\nTask\r\nTo do: aaargh\r\nDateTime: 18-4-2012 12:00:00\r\nReminders:\r\n18-4-2012 11:00:00\r\n" string
search "Task\r\nTo do: aaargh\r\nDateTime: 18-4-2012 12:00:00\r\nReminders\r\n18-4-2012 11:00:00\r\n" string
replace "Task\r\nTo do: aaargh\r\nDateTime: 18-4-2012 13:00:00\r\nReminders\r\n18-4-2012 11:00:00\r\n" string
But it doesn't change the text. How do I make sure that the Regex.Replace finds the right piece of content?
PS. I did check several topics on this, but none of the solutions mentioned there work for me.
You missed a : right after Reminders. Just check it again :)
You could try using a StringBuilder to build up you want to write out to the file.
Just knocked up a quick example in a console app but this appears to work for me and I think it might be what you are looking for.
StringBuilder sb = new StringBuilder();
sb.Append("Tasks\r\n");
sb.Append("\r\n");
sb.Append("\tTask 1 details");
Console.WriteLine(sb.ToString());
StreamWriter writer = new StreamWriter("Tasks.txt");
writer.Write(sb.ToString());
writer.Close();

How do I use lucene.net for searching file content?

I am currently using lucene.net to search the content of files for keyword search. I am able to get the results correctly but I have a scenario where I need to display the keywords found in a particular file.
There are two different files containing "karthik" and "steven", and if I search for "karthik and steven" I am able to get both the files displayed. If I search only for "karthik" and "steven" separately, only the respective files are getting displayed.
When I search for "karthik and steven" simultaneously I get both the files in the result as I am displaying the filename alone, and now I need to display the particular keyword found in that particular file as a record in the listview.
Public bool StartSearch()
{
bool bResult = false;
Searcher objSearcher = new IndexSearcher(mstrIndexLocation);
Analyzer objAnalyzer = new StandardAnalyzer();
try
{
//Perform Search
DateTime dteStart = DateTime.Now;
Query objQuery = QueryParser.Parse(mstrSearchFor, "contents", objAnalyzer);
Hits objHits = objSearcher.Search(objQuery, objFilter);
DateTime dteEnd = DateTime.Now;
mlngTotalTime = (Date.GetTime(dteEnd) - Date.GetTime(dteStart));
mlngNumHitsFound = objHits.Length();
//GeneratePreviewText(objQuery, mstrSearchFor,objHits);
//Generate results - convert to XML
mstrResultsXML = "";
if (mlngNumHitsFound > 0)
{
mstrResultsXML = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><Results>";
//Loop through results
for (int i = 0; i < objHits.Length(); i++)
{
try
{
//Get the next result
Document objDocument = objHits.Doc(i);
//Extract the data
string strPath = objDocument.Get("path");
string strFileName = objDocument.Get("name");
if (strPath == null) { strPath = ""; }
string strLastWrite = objDocument.Get("last_write_time");
if (strLastWrite == null)
strLastWrite = "unavailable";
else
{
strLastWrite = DateField.StringToDate(strLastWrite).ToShortDateString();
}
double dblScore = objHits.Score(i) * 100;
string strScore = String.Format("{0:00.00}", dblScore);
//Add results as an XML row
mstrResultsXML += "<Row>";
//mstrResultsXML += "<Sequence>" + (i + 1).ToString() + "</Sequence>";
mstrResultsXML += "<Path>" + strPath + "</Path>";
mstrResultsXML += "<FileName>" + strFileName + "</FileName>";
//mstrResultsXML += "<Score>" + strScore + "%" + "</Score>";
mstrResultsXML += "</Row>";
}
catch
{
break;
}
}
//Finish off XML
mstrResultsXML += "</Results>";
//Build Dataview (to bind to datagrid
DataSet objDS = new DataSet();
StringReader objSR = new StringReader(mstrResultsXML);
objDS.ReadXml(objSR);
objSR = null;
mobjResultsDataView = new DataView();
mobjResultsDataView = objDS.Tables[0].DefaultView;
}
//Finish up
objSearcher.Close();
bResult = true;
}
catch (Exception e)
{
mstrError = "Exception: " + e.Message;
}
finally
{
objSearcher = null;
objAnalyzer = null;
}
return bResult;
}
Above is the code i am using for search and the xml i am binding to the listview, now i need to tag the particular keywords found in the respective document and display it in the listview as recordsss,simlar to the below listview
No FileName KeyWord(s)Found
1 Test.Doc karthik
2 Test2.Doc steven
i hope u guys undesrtood the question,
This depends on how your documents were indexed. You'll need to extract the original content, pass it through the analyzer to get the indexed tokens, and check which matches the generated query.
Just go with the Highlighter.Net package, part of contrib, which does this and more.

Remove words from string c#

I am working on a ASP.NET 4.0 web application, the main goal for it to do is go to the URL in the MyURL variable then read it from top to bottom, search for all lines that start with "description" and only keep those while removing all HTML tags. What I want to do next is remove the "description" text from the results afterwords so I have just my device names left. How would I do this?
protected void parseButton_Click(object sender, EventArgs e)
{
MyURL = deviceCombo.Text;
WebRequest objRequest = HttpWebRequest.Create(MyURL);
objRequest.Credentials = CredentialCache.DefaultCredentials;
using (StreamReader objReader = new StreamReader(objRequest.GetResponse().GetResponseStream()))
{
originalText.Text = objReader.ReadToEnd();
}
//Read all lines of file
String[] crString = { "<BR> " };
String[] aLines = originalText.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
String noHtml = String.Empty;
for (int x = 0; x < aLines.Length; x++)
{
if (aLines[x].Contains(filterCombo.SelectedValue))
{
noHtml += (RemoveHTML(aLines[x]) + "\r\n");
}
}
//Print results to textbox
resultsBox.Text = String.Join(Environment.NewLine, noHtml);
}
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
Ok so I figured out how to remove the words through one of my existing functions:
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n").Replace("description", "").Replace("INFRA:CORE:", "")
.Replace("RESERVED", "")
.Replace(":", "")
.Replace(";", "")
.Replace("-0/3/0", "");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
public static void Main(String[] args)
{
string str = "He is driving a red car.";
Console.WriteLine(str.Replace("red", "").Replace(" ", " "));
}
Output:
He is driving a car.
Note: In the second Replace its a double space.
Link : https://i.stack.imgur.com/rbluf.png
Try this.It will remove all occurrence of the word which you want to remove.
Try something like this, using LINQ:
List<string> lines = new List<string>{
"Hello world",
"Description: foo",
"Garbage:baz",
"description purple"};
//now add all your lines from your html doc.
if (aLines[x].Contains(filterCombo.SelectedValue))
{
lines.Add(RemoveHTML(aLines[x]) + "\r\n");
}
var myDescriptions = lines.Where(x=>x.ToLower().BeginsWith("description"))
.Select(x=> x.ToLower().Replace("description",string.Empty)
.Trim());
// you now have "foo" and "purple", and anything else.
You may have to adjust for colons, etc.
void Main()
{
string test = "<html>wowzers description: none <div>description:a1fj391</div></html>";
IEnumerable<string> results = getDescriptions(test);
foreach (string result in results)
{
Console.WriteLine(result);
}
//result: none
// a1fj391
}
static Regex MyRegex = new Regex(
"description:\\s*(?<value>[\\d\\w]+)",
RegexOptions.Compiled);
IEnumerable<string> getDescriptions(string html)
{
foreach(Match match in MyRegex.Matches(html))
{
yield return match.Groups["value"].Value;
}
}
Adapted From Code Project
string value = "ABC - UPDATED";
int index = value.IndexOf(" - UPDATED");
if (index != -1)
{
value = value.Remove(index);
}
It will print ABC without - UPDATED

Categories