parsing text from HTML source - c#

I have this Html (xml form) result in my program
All I want is get info from this source(director - music .....)
is there any way to grouping text like 1 and 2 in picture with c# ?

The quickest option you have is to use .Split. First I will split the entire source with the character { (this will give you your sections) and then I will .Split again each one of those sections with the character | From there you only need to parse what you need, you'll end up with an array of Name=Values.
Something like this will help:
var blocks = YourVariableHoldingSource.Split('{')
foreach(var block in blocks){
var details = blocks.Split('|')
foreach(var data in details){
MessageBox.Show(data);
}
}

Related

How do i properly update information in a text file?

I want to save some data in a text file like this:
Name = Frank
Age = 28
Registered = False
Now i want to read/update the Data contained in each row. For example I need to change the Name to "Tim", I have to find row Name and than replace the string after the "="
Im not quiet sure how to solve this properly and i couldnt find anything on Google that satisfied me
I tried to update it with the text.Replace() method but it only chances the string it actually finds.
I expect to read the correct data out of the row and replace it if needed
There are a wide variety of ways to do this. I'll contribute one of them (which I think is simpler to understand).
Step 1: Read the entire file into a string.
Step 2: Convert it to string.
Step 3: Process the string using simple methods like split and join.
Step 4: Overwrite the previous file with the processed string.
The code is below:
if (File.Exists(your_file_path)){
string yourfile = File.ReadAllText(your_file_path);
// Now the file is a simple string that you can manipulate using
// string split functions.
// For Example:
// break by lines
string[] lines = yourfile.Split('\n');
foreach (string line in lines){
if (line.Substring(0,4) == "Name"){
// replace the necessary line
line = "Name = Tim";
break;
}
}
// Join the array again
yourfile = lines.Join("\n", lines);
File.WriteAllText(your_file_path, yourfile);
}
Try to save the file in Json format. Like
{
"Name" : "Frank"
"Age" : 28
"Registered" : False
}
Then read the file deserialize it to object by using Newtonsoft json
Then update your property(Name) serialize it to string again and then write again to the same file.
In this Approach very less chances of errors.

Importing .csv file in to listview

I'm trying to load a .csv file into a listview:
ofDialog.Filter = #"CSV Files|*.csv";
ofDialog.Title = #"Select your backlink file...";
ofDialog.FileName = "backlinks.csv";
// is cancel pressed?
if (ofDialog.ShowDialog() == DialogResult.Cancel)
return;
try
{
string filename = ofDialog.FileName;
var lines = File.ReadAllLines(filename);
foreach (string line in lines)
{
var parts = line.Split(' ');
ListViewItem lvi = new ListViewItem(parts[0]);
lvi.SubItems.Add(parts[1]);
listViewMain.Items.Add(lvi);
}
// update count
Helpers.returnMessage(File.ReadAllLines(ofDialog.FileName).Count() + " rows imported.");
}
catch (Exception ex)
{
Helpers.returnMessage(ex.Message);
}
The csv contents looks like:
URL Rating Domain Rating IP From Referring Page URL Referring Page Title Internal Links Count External Links Count Link URL TextPre Link Anchor TextPost Size Type NoFollow Site-wide Image Encoding Alt First Seen Previous Visited Last Check Original
24 89 91.198.174.192 http://en.wikipedia.org/wiki/Humbug_(sweet) "Humbug (sweet) - Wikipedia, the free encyclopedia" 118 16 http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg http://www.bestbritishsweets.co.uk/user/products/large/everton.jpg 12163 href True False False utf8 2013-09-08T15:14:50Z 2015-03-11T01:48:40Z 2015-03-11T01:48:40Z True
There is no delimeter "," like in regular .csv files, and has different spaces between some fields, i'm stuck on the best way to split each section and add to the listview, i have a mental block lol
any help would be appreciated :)
cheers guys
Graham
For opening the CSV file, I would first check it is not a tab separated file, where you can use \t as the delimiter to read the file in a similar method as you are.
Failing this you could use a (very long and complicated) regex string to match the different "columns" as different parts. The regex string would look something like:
\s+([0-9]*)\s+([0-9]*)\s+([0-9]*.[0-9]*.[0-9]*.[0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+(\"[a-zA-Z0-9 \-\(\),]*\")\s+([0-9]*)\s+([0-9]*)\s+([a-zA-Z:\/._\(\)]*)\s+([a-zA-Z:\/._\(\)]*)\s+([0-9]*)\s+([a-zA-Z]*)\s+(True|False)\s+(True|False)\s+(True|False)\s+([a-z0-9]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+([0-9\-T:Z]*)\s+(True|False)
This would return each column as a different group, which you can access as detailed below:
var regex = new Regex(regexString);
foreach(var line in lines)
{
var match = regex.Match(line);
var urlRating = match.Groups[0].Value;
var domainRating = match.Groups[1].Value;
var ip = match.Groups[2].Value;
// ...
}
You can see more about the regex string I have created (and possibly simplify it/extend it for the additional lines) here: https://regex101.com/r/oN4tW3/1
For more on C# regex look here: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx
Edit: I would avoid the regex method if it is tab seperated as it is more complex and fragile

Split string from text file

I'm trying to convert string to keys from a text file and I need to split text.
For example:
Code c#
string[] controls = File.ReadAllLines(FilePath);
Keys move up = (Keys)Enum.Parse(type of(Keys),controls[1].Split("|", StringSplitOption.None), true);
In the text file at the line[1] I have :
moveUp |W;
I want to set the char W as keys.
Thanks to reply and sorry if my English looks weird.
If you are interested in string after | , then this should be:
controls[1].Split("|", StringSplitOption.None)
replaced with this:
controls[1].Split("|")[1]
[1] means return the 2nd index value from array which will be created by Split()
If you are trying to get from Line 1 then controls[1] should be controls[0] because arrays are zero index based.

auto detect tag within a text

Does there is any library or algorithm that can do auto detection of tags in a text (ignoring the usual words of the chosen language)?
Something like this:
string[] keywords = GetKeyword("Your order is num #0123456789")
and keywords[] would contain "order" and "#0123456789" ...?
Does it exist? Or the user will select by himself all the tags of every document all the time? :?
foreach(string keyword in keywords) { // where keywords is a List<string>
if ("Your order is num #0123456789".Contains(keyword)) {
keywordsPresent.Add(keyword); // where keywordsPresent is a List<string>
}
}
return keywordsPresent;
What the above does is not cater for your #0123456789, for that add some more logic to find the index of the # or something...
Sorry, I misunderstood the question. If you want to look for specific words, the algorithm will depend on you strings. For example, you can use string.Split() to generate an array of words from one string, and then work with that, like this:
string[] words = string.Split("Your order is num #0123456789");
string orderNumber = "";
if(words.Contains("order") && w.StartsWith("#").Count > 0)
{
orderNumber = words.Where(w=>w.StartsWith("#").FirstOrDefault();
}
This will first generate an array of words from "Your order is num #0123456789" , then if it contains the word "order" it will wind a word that starts with "#" and select that;
I think that a lot of different algorithms can be used. Some of them are simple another are super complex. I can suggest you the next basic way:
Split all text into array of words.
Remove stop words from the array. (Goole "stop words list" to get full list of stop words.)
Walk through the array and calculate count of each word.
Sort words in accordance with their 'weight' in the array.
Choose necessary amount of tags.

searching a textfile for a keyword

I have a text file with names as balamurugan,chendurpandian,......
if i give a value in the textbox as ba ....
If i click a submit button means i have to search the textfile for the value ba and display as pattern matched....
I have read the text file using
string FilePath = txtBoxInput.Text;
and displayed it in a textbox using
textBoxContents.Text = File.ReadAllText(FilePath);
But i dont know how to search a word in a text file using c# can anyone give suggestion???
You can simply use:
textBoxContents.Text.Contains(keyword)
This will return true if your text contains your chosen keyword.
Depends upon the kind of pattern matching that you needs - you can use as simple as String.Contains method or can try out Regular Expressions that will give you more control on how you want to search and give all matches at the same time. Here are couple of links to get you started quickly on regular expressions:
http://www.codeproject.com/KB/dotnet/regextutorial.aspx
http://www.developer.com/open/article.php/3330231/Regular-Expressions-Primer.htm
First, you should split up the input string, after which you could do a contains on each value:
// On file read:
String[] values = File.ReadAllText(FilePath);
// On search:
List<String> results = new List<String>();
for(int i = 0; i < values.Length; i++) {
if(values[i].Contains(search)) results.Add(values[i]);
}
Alternatively, if you only want it to search at the beginning or the end of the string, you can use StartsWith or EndsWith, respectively:
// Only match beginnging
values[i].StartsWith(search);
// Only match end
values[i].EndsWith(search);

Categories