I can not get the logic how to search in a text file and then get the data I need using model view view model.
Basically, I have to make a dictionary app and I have word,language and description in the text file. Like:
cat;e English; it is a four leg animal
In the model I have a text box where the client writes a word and two other boxes, where language and description of the word should be shown.
I just can not get how to search in this file. I tried to search online but nothing seemed to meet my exact question.
Unless your file is going to change you can get away with reading the entire file up front when running your application and putting the data into lists of models for your view models.
As this is essentially a CSV file, and assuming each entry is a line, using a Semi-colon as the delimiter we can use the .Net CSV parser to process your file into your models:
Basic Model:
public class DictionaryEntryModel {
public string Word { get; set; }
public string Language { get; set; }
public string Description { get; set; }
}
Example view model with a constructor to fill out your models:
public class DictionaryViewModel {
// This will be a INotify based property in your VM
public List<DictionaryEntryModel> DictionaryEntries { get; set; }
public DictionaryViewModel () {
DictionaryEntries = new List<DictionaryEntryModel>();
// Create a parser with the [;] delimiter
var textFieldParser = new TextFieldParser(new StringReader(File.ReadAllText(filePath)))
{
Delimiters = new string[] { ";" }
};
while (!textFieldParser.EndOfData)
{
var entry = textFieldParser.ReadFields();
DictionaryEntries.Add(new DictionaryEntryModel()
{
Word = entry[0],
Language = entry[1],
Description = entry[2]
});
}
// Don't forget to close!
textFieldParser.Close();
}
}
You can now bind your view using the property DictionaryEntries and as long as your app is open it will preserve your full file as the list of DictionaryEntryModel.
Hope this helps!
I'm not addressing the MVVM part here, but just how to search the text file in order to get resulting data according to a search term, using case insensitive regex.
string dictionaryFileName = #"C:\Test\SampleDictionary.txt"; // replace with your file path
string searchedTerm = "Cat"; // Replace with user input word
string searchRegex = string.Format("^(?<Term>{0});(?<Lang>[^;]*);(?<Desc>.*)$", searchedTerm);
string foundTerm;
string foundLanguage;
string foundDescription;
using (var s = new StreamReader(dictionaryFileName, Encoding.UTF8))
{
string line;
while ((line = s.ReadLine()) != null)
{
var matches = Regex.Match(line, searchRegex, RegexOptions.IgnoreCase);
if (matches.Success)
{
foundTerm = matches.Groups["Term"].Value;
foundLanguage = matches.Groups["Lang"].Value;
foundDescription = matches.Groups["Desc"].Value;
break;
}
}
}
Then you can display the resulting strings to the user.
Note that this will work for typical input words, but it might produce strange results if the user inputs special characters that interfere with the regular expression syntax. Most of this might be corrected by utilizing Regex.Escape(searchedTerm).
Related
I want to read a textfile dynamically based on the headers. Consider an example like this
name|email|phone|othername|company
john|john#example.com|1234||example
doe|doe#example.com||pin
jane||98485|
The values to be read like this for the following records
name email phone othername company
john john#example.com 1234 example
doe doe#example.com pin
jane 98485
I tried using this
using (StreamReader sr = new StreamReader(new MemoryStream(textFile)))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine(); //Using readline method to read text file.
string[] strlist = line.Split('|'); //using string.split() method to split the string.
Obj obj = new Obj();
obj.Name = strlist[0].ToString();
obj.Email = strlist[1].ToString();
obj.Phone = strlist[2].ToString();
obj.othername = strlist[3].ToString();
obj.company = strlist[4].ToString();
}
}
Above code works if all the delimiters are put exactly but doesn't work when given dynamically like the above. Any possible solution for this?
If you have any control over this, you should use a better serialization techinology, or at least use a csv parser that can deal with this sort of format. However, if you want to use string.Split, you can also take advantage of ElementAtOrDefault
Returns the element at a specified index in a sequence or a default
value if the index is out of range.
Given
public class Data
{
public string Name { get; set; }
public string Email { get; set; }
public string Phone { get; set; }
public string OtherName { get; set; }
public string Company { get; set; }
}
Usage
var results = File
.ReadLines(SomeFileName) // stream the lines from a file
.Skip(1) // skip the header
.Select(line => line.Split('|')) // split on pipe
.Select(items => new Data() // populate some funky class
{
Name = items.ElementAtOrDefault(0),
Email = items.ElementAtOrDefault(1),
Phone = items.ElementAtOrDefault(2),
OtherName = items.ElementAtOrDefault(3),
Company = items.ElementAtOrDefault(4)
});
foreach (var result in results)
Console.WriteLine($"{result.Name}, {result.Email}, {result.Phone}, {result.OtherName}, {result.Company}");
Output
john, john#example.com, 1234, , example
doe, doe#example.com, , pin,
jane, , 98485, ,
When you split the line like string[] strlist = line.Split('|'); you can get undesired results.
For example: jane||98485| generates an array of just 4 elements as you can check here https://rextester.com/WBOT6074 online.
You should check your array strList after generating it with thinks like measuring the size.
As you haven't given clear details about the problem I cannot give a more especific answer to it.
I'm trying to learn some C# here. My goal is to create and write on multiple custom files which name varies based on a part of the string to be written. Below some examples:
Let's say strings to be written are basically rows of a csv file:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G
2019-10-29 16:13:11;;18;0;;3;false;false;true;
As you may notice, first field of each string is a date, and that's and that is the key field to choose the name of the file to write to.
First two fields have same date, so both strings will be printed on a single file, the third one in a second file since it has different date.
Expected Result:
First File:
2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;
2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;
Second File:
2019-10-29 16:13:11;;18;0;;3;false;false;true;
Now I have multiple rows like those, and I'd like to print them on different files based on their first value.
I managed to create a class which might represent each row:
class Value {
public DateTime date = DateTime.Now;
public decimal cod = 0;
public decimal quantity = 0;
public decimal price = 0;
//Other irrelevant fields
}
And I also tried to develop a method to write a single Value on given File:
private static void WriteValue(Value content, string folder, string fileName) {
using(StreamWriter writer = new StreamWriter(Path.Combine(folder, fileName), true, Encoding.ASCII)) {
writer.Write(content.dataora.ToString("yyyyMMdd"));
writer.Write("0000");
writer.Write("I");
writer.Write("C");
writer.Write(content.codpro.ToString().PadLeft(14, '0'));
writer.Write(Convert.ToInt64(content.qta * 100).ToString().PadLeft(8, '0'));
writer.WriteLine();
}
}
And a Method to write Values them into files
static void WriteValues(List<Value> fileContent) {
//Once I got all Values of File in a List of Values, I try to write them in files
}
if(fileContent.Count > 0) {
foreach(Value riga in fileContent) {
//Temp Dates, used to compare previous Date in order to know if I have to write Value in new File or it can be written on same File
string dataTemp = riga.dataora.ToString("yyyy-MM-dd");
string lastData = string.Empty;
string FileName = "ordinivoa_999999." + DateTime.Now.ToString("yyMMddHHmmssfff");
//If lastData is Empty we are writing first value
if (string.IsNullOrEmpty(lastData)) {
WriteValue(riga, toLinfaFolder, FileName);
lastData = dataTemp;
}
//Else if lastData is equal as last managed date we write on same file
else if (lastData == dataTemp) {
WriteValue(riga, toLinfaFolder, FileName);
}
else {
//Else current date of Value is new, so we write it in another file
string newFileName = "ordinivoa_999999." + DateTime.Now.AddMilliseconds(1).ToString("yyMMddHHmmssfff");
WriteValue(riga, toLinfaFolder, newFileName);
lastData = dataTemp;
}
}
}
}
My issue is method above has strange behavior, writes first equal dates on a single file, which is good, but writes all other values in a single file, even if we have different dates.
How to make sure each value gets printed on in a single file only if has same date value?
You can group equal dates easily with a LINQ query
private static void WriteValues(List<Value> fileContent)
{
var dateGroups = fileContent
.GroupBy(v => $"ordinivoa_999999.{v.date:yyMMddHHmmssfff}");
foreach (var group in dateGroups) {
string path = Path.Combine(toLinfaFolder, group.Key);
using (var writer = new StreamWriter(path, true, Encoding.ASCII)) {
foreach (Value item in group) {
//TODO: write item to file
writer.WriteLine(...
}
}
}
}
Since a DateTime stores values in units of one ten-millionth of a second, two dates looking equal once formatted, might still be different. So I suggest grouping on the filename to avoid this effect. I used string interpolation to create and format the file name.
Don't open and close the file for each text line.
At the top of your code file you need a
using System.Linq;
You are on the right path declaring a class, but you're also doing a whole bunch of unnecessary stuff. Using LINQ this can be simplified by a great deal.
First I define a class, and since all you want to do is write each record, I would use a DateTime field, and a string field for the entire raw record.
class MyRecordOfSomeType
{
public DateTime Date { get; set; }
public string RawData { get; set; }
}
The DateTime filed is so that it'll come in handy when you're doing LINQ.
Now we iterate through your data, split using ;, then create your class instance list.
var data = new List<string>()
{
"2019-10-28 16:14:14;;15.5;0;;3;false;false;0;111;123;;;10;false;1;2.5;;;;0;",
"2019-10-28 16:13:11;;18;0;;1;false;false;222;333;123;;;10;false;1;1;;;;0;G",
"2019-10-29 16:13:11;;18;0;;3;false;false;true;"
};
var records = new List<MyRecordOfSomeType>();
foreach (var item in data)
{
var parts = item.Split(';');
DateTime.TryParse(parts[0], out DateTime result);
var rec = new MyRecordOfSomeType() { Date = result, RawData = item };
records.Add(rec);
}
Then we group by date. Note that it's important to group by the Date component of the DateTime structure, otherwise it will consider the Time component as well and you'll have more files than you need.
var groups = records.GroupBy(x => x.Date.Date);
Finally, iterate your groups, and write contents of each group to a new file.
foreach (var group in groups)
{
var fileName = string.Format("ordinivoa_999999_{0}.csv", group.Key.ToString("yyMMddHHmmssfff"));
File.WriteAllLines(fileName, group.Select(x => x.RawData));
}
I have a .txt file with a list of items (u.s. state and capitals) going down such as Arizona:Phoenix Arkansas:Little Rock California:Sacramento. I'm going to be importing that list, but only want to display the States in a Combobox. After that, if comboBox1.Items[0] is selected, I want it to get the corresponding item that was initially parsed along with it after the : delimiter. My initial solution was to create a class to hold both values, and hold them in a List and compare the index from the Combobox to that of the List to get the matching value. I feel like this might be overkill and I am over thinking it for something as simple as a combobox where the data won't be subjected to any complex manipulations. Would there be a simpler method/datatype to use to do this? I just want to get the corresponding value after the : delimiter from the Combobox index that was parsed when it was first loaded.
First of all build your classes of State & Capital like this:
public class State
{
public string stateName { get; set; }
public Capital capital { get; set; }
}
public class Capital
{
public string capitalName { get; set; }
}
Read the text file, generate a list and populate the ComboBox like this:
List<State> list = new List<State>();
var file = File.ReadAllLines(FilePath).ToList();
foreach (var item in file)
list.Add(new State()
{
stateName = item.Split(':')[0],
capital = new Capital() { capitalName = item.Split(':')[1] }
});
StatesCB.DataSource = list.Select(x => x.stateName).ToList();
And within your ComboBoxIndexChange eventHandler, get the Capital based on the State.
private void Sates_SelectedIndexChanged(object sender, EventArgs e)
{
capital.Text = list.Where(x => x.stateName == StatesCB.SelectedValue)
.Select(x => x.capital.capitalName).FirstOrDefault();
}
It works and address your problem perfetcly.
You can try this:
I assume your text file contains the following lines:
Arizona:Phoenix
Arkansas:Little
Rock California:Sacramento
On your code:
List<string> lstResult = new List<string>();
using (StreamReader sr = new StreamReader(#"C:\Stack\file.txt"))
{
string line = string.Empty;
while ((line = sr.ReadLine()) != null)
{
//Here I am getting the second part of splitted string which is your requirement
lstResult.Add(line.Split(':').Select(x=>x).Skip(1).SingleOrDefault().ToString());
}
}
comboBox1.DataSource = lstResult;
This will produce:
I am implementing a integration with NetSuite in C#. In the external system I need to populate a list of countries that will match NetSuite's country list.
The NetSuite Web Service provides an enumeration call Country
public enum Country {
_afghanistan,
_alandIslands,
_albania,
_algeria,
...
You can also get a list of country Name and Code (in an albeit not so straight forward way) from the web service. (See: http://suiteweekly.com/2015/07/netsuite-get-all-country-list/)
Which gives you access to values like this:
Afghanistan, AF
Aland Islands, AX
Albania, AL
Algeria, DZ
American Samoa, AS
...
But, as you can see, there is no way to link the two together. (I tried to match by index but that didn't work and sounds scary anyway)
NetSuite's "help" files have a list. But this is static and I really want a dynamic solution that updates as NetSuites updates because we know countries will change--even is not that often.
Screenshot of Country Enumerations from NetSuite help docs
The only solutions I have found online are people who have provided static data that maps the two sets of data. (ex. suiteweekly.com /2015/07/netsuite-complete-country-list-in-netsuite/)
I cannot (don't want to) believe that this is the only solution.
Anyone else have experience with this that has a better solution?
NetSuite, if you are reading, come on guys, give a programmer a break.
The best solution I have come up with is to leverage the apparent relationship between the country name and the enumeration key to forge a link between the two. I am sure others could improve on this solution but what I would really like to see is a solution that isn't a hack like this that relies on an apparent pattern but rather on that is based on an explicit connection. Or better yet NetSuite should just provide the data in one place all together.
For example you can see the apparent relationship here:
_alandIslands -> Aland Islands
With a little code I can try to forge a match.
I first get the Enumeration Keys into an array. And I create a list of objects of type NetSuiteCountry that will hold my results.
var countryEnumKeys = Enum.GetNames(typeof(Country));
var countries = new List<NetSuiteCountry>();
I then loop through the list of country Name and Code I got using the referenced code above (not shown here).
For each country name I then strip all non-word characters from the country name with Regex.Replace, prepend an underscore (_) and then convert the string to lowercase. Finally I try to find a match between the Enumeration Key (converted to lowercase as well) and the matcher string that was created. If a match is found I save all the data together the countries list.
UPDATE: Based on the comments I have added additional code/hacks to try to deal with the anomalies without hard-coding exceptions. Hopefully these updates will catch any future updates to the country list as well, but no promises. As of this writing it was able to handle all the known anomalies. In my case I needed to ignore Deprecated countries so those aren't included.
foreach (RecordRef baseRef in baseRefList)
{
var name = baseRef.name;
//Skip Deprecated countries
if (name.EndsWith("(Deprecated)")) continue;
//Use the name to try to find and enumkey match and only add a country if found.
var enumMatcher = $"_{Regex.Replace(name, #"\W", "").ToLower()}";
//Compares Ignoring Case and Diacritic characters
var enumMatch = CountryEnumKeys.FirstOrDefault(e => string.Compare(e, enumMatcher, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0);
//Then try by Enum starts with Name but only one.
if (enumMatch == null)
{
var matches = CountryEnumKeys.Where(e => e.ToLower().StartsWith(enumMatcher));
if (matches.Count() == 1)
{
Debug.Write($"- Country Match Hack 1 : ");
enumMatch = matches.First();
}
}
//Then try by Name starts with Enum but only one.
if (enumMatch == null)
{
var matches = CountryEnumKeys.Where(e => enumMatcher.StartsWith(e.ToLower()));
if (matches.Count() == 1)
{
Debug.Write($"- Country Match Hack 2 : ");
enumMatch = matches.First();
}
}
//Finally try by first half Enum and Name match but again only one.
if (enumMatch == null)
{
var matches = CountryEnumKeys.Where(e => e.ToLower().StartsWith(enumMatcher.Substring(0, (enumMatcher.Length/2))));
if (matches.Count() == 1)
{
Debug.Write($"- Country Match Hack 3 : ");
enumMatch = matches.First();
}
}
if (enumMatch != null)
{
var enumIndex = Array.IndexOf(CountryEnumKeys, enumMatch);
if (enumIndex >= 0)
{
var country = (Country) enumIndex;
var nsCountry = new NetSuiteCountry
{
Name = baseRef.name,
Code = baseRef.internalId,
EnumKey = country.ToString(),
Country = country
};
Debug.WriteLine($"[{nsCountry.Name}] as [{nsCountry.EnumKey}]");
countries.Add(nsCountry);
}
}
else
{
Debug.WriteLine($"Could not find Country match for: [{name}] as [{enumMatcher}]");
}
}
Here is my NetSuiteCountry class:
public class NetSuiteCountry
{
public string Name { get; set; }
public string Code { get; set; }
public string EnumKey { get; set; }
public Country Country { get; set; }
}
Let me start off with a disclaimer that I'm not a coder, and this is the first day I've tried to look at a C# program.
I need something similar for a Javascript project where I need the complete list of Netsuite company names, codes and their numeric values and when reading the help it seemed like the only way was through webservices.
I downloaded the sample application for webservices from Netsuite and a version of Visual Studio and I was able to edit the sample program provided to create a list of all of the country names and country codes (ex. Canada, CA).
I started out doing something similar to the previous poster to get the list of country names:
string[] countryList = Enum.GetNames(typeof(Country));
foreach (string s in countryList)
{
_out.writeLn(s);
}
But I later got rid of this and started a new technique. I created a class similar to the previous answer:
public class NS_Country
{
public string countryCode { get; set; }
public string countryName { get; set; }
public string countryEnum { get; set; }
public string countryNumericID { get; set; }
}
Here is the new code for getting the list of company names, codes and IDs. I realize that it's not very efficient as I mentioned before I'm not really a coder and this is my first attempt with C#, lots of Google and cutting/pasting ;D.
_out.writeLn(" Attempting to get Country list.");
// Create a list for the NS_Country objects
List<NS_Country> CountryList = new List<NS_Country>();
// Create a new GetSelectValueFieldDescription object to use in a getSelectValue search
GetSelectValueFieldDescription countryDesc = new GetSelectValueFieldDescription();
countryDesc.recordType = RecordType.customer;
countryDesc.recordTypeSpecified = true;
countryDesc.sublist = "addressbooklist";
countryDesc.field = "country";
// Create a GetSelectValueResult object to hold the results of the search
GetSelectValueResult myResult = _service.getSelectValue(countryDesc, 0);
BaseRef[] baseRef = myResult.baseRefList;
foreach (BaseRef nsCountryRef in baseRef)
{
// Didn't know how to do this more efficiently
// Get the type for the BaseRef object, get the property for "internalId",
// then finally get it's value as string and assign it to myCountryCode
string myCountryCode = nsCountryRef.GetType().GetProperty("internalId").GetValue(nsCountryRef).ToString();
// Create a new NS_Country object
NS_Country countryToAdd = new NS_Country
{
countryCode = myCountryCode,
countryName = nsCountryRef.name,
// Call to a function to get the enum value based on the name
countryEnum = getCountryEnum(nsCountryRef.name)
};
try
{
// If the country enum was verified in the Countries enum
if (!String.IsNullOrEmpty(countryToAdd.countryEnum))
{
int countryEnumIndex = (int)Enum.Parse(typeof(Country), countryToAdd.countryEnum);
Debug.WriteLine("Enum: " + countryToAdd.countryEnum + ", Enum Index: " + countryEnumIndex);
_out.writeLn("ID: " + countryToAdd.countryCode + ", Name: " + countryToAdd.countryName + ", Enum: " + countryToAdd.countryEnum);
}
}
// There was a problem locating the country enum that was not handled
catch (Exception ex)
{
Debug.WriteLine("Enum: " + countryToAdd.countryEnum + ", Enum Index Not Found");
_out.writeLn("ID: " + countryToAdd.countryCode + ", Name: " + countryToAdd.countryName + ", Enum: Not Found");
}
// Add the countryToAdd object to the CountryList
CountryList.Add(countryToAdd);
}
// Create a JSON - I need this for my javascript
var javaScriptSerializer = new System.Web.Script.Serialization.JavaScriptSerializer();
string jsonString = javaScriptSerializer.Serialize(CountryList);
Debug.WriteLine(jsonString);
In order to get the enum values, I created a function called getCountryEnum:
static string getCountryEnum(string countryName)
{
// Create a dictionary for looking up the exceptions that can't be converted
// Don't know what Netsuite was thinking with these ones ;D
Dictionary<string, string> dictExceptions = new Dictionary<string, string>()
{
{"Congo, Democratic Republic of", "_congoDemocraticPeoplesRepublic"},
{"Myanmar (Burma)", "_myanmar"},
{"Wallis and Futuna", "_wallisAndFutunaIslands"}
};
// Replace with "'s" in the Country names with "s"
string countryName2 = Regex.Replace(countryName, #"\'s", "s");
// Call a function that replaces accented characters with non-accented equivalent
countryName2 = RemoveDiacritics(countryName2);
countryName2 = Regex.Replace(countryName2, #"\W", " ");
string[] separators = {" ","'"}; // "'" required to deal with country names like "Cote d'Ivoire"
string[] words = countryName2.Split(separators, StringSplitOptions.RemoveEmptyEntries);
for (var i = 0; i < words.Length; i++)
{
string word = words[i];
if (i == 0)
{
words[i] = char.ToLower(word[0]) + word.Substring(1);
}
else
{
words[i] = char.ToUpper(word[0]) + word.Substring(1);
}
}
string countryEnum2 = "_" + String.Join("", words);
// return an empty string if the country name contains Deprecated
bool b = countryName.Contains("Deprecated");
if (b)
{
return String.Empty;
}
else
{
// test to see if the country name was one of the exceptions
string test;
bool isExceptionCountry = dictExceptions.TryGetValue(countryName, out test);
if (isExceptionCountry == true)
{
return dictExceptions[countryName];
}
else
{
return countryEnum2;
}
}
}
In the above I used a function, RemoveDiacritics I found here. I will repost the referenced function below:
static string RemoveDiacritics(string text)
{
string formD = text.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
foreach (char ch in formD)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(ch);
}
}
return sb.ToString().Normalize(NormalizationForm.FormC);
}
Here are the tricky cases to test any solution you develop with:
// Test tricky names
Debug.WriteLine(getCountryEnum("Curaçao"));
Debug.WriteLine(getCountryEnum("Saint Barthélemy"));
Debug.WriteLine(getCountryEnum("Croatia/Hrvatska"));
Debug.WriteLine(getCountryEnum("Korea, Democratic People's Republic"));
Debug.WriteLine(getCountryEnum("US Minor Outlying Islands"));
Debug.WriteLine(getCountryEnum("Cote d'Ivoire"));
Debug.WriteLine(getCountryEnum("Heard and McDonald Islands"));
// Enums that fail
Debug.WriteLine(getCountryEnum("Congo, Democratic Republic of")); // _congoDemocraticPeoplesRepublic added to exceptions
Debug.WriteLine(getCountryEnum("Myanmar (Burma)")); // _myanmar added to exceptions
Debug.WriteLine(getCountryEnum("Netherlands Antilles (Deprecated)")); // Skip Deprecated
Debug.WriteLine(getCountryEnum("Serbia and Montenegro (Deprecated)")); // Skip Deprecated
Debug.WriteLine(getCountryEnum("Wallis and Futuna")); // _wallisAndFutunaIslands added to exceptions
For my purposes I wanted a JSON object that had all the values for Coutries (Name, Code, Enum, Value). I'll include it here in case anyone is searching for it. The numeric values are useful when you have a 3rd party HTML form that has to forward the information to a Netsuite online form.
Here is a link to the JSON object on Pastebin.
My appologies for the lack of programming knowledge (only really do a bit of javascript), hopefully this additional information will be useful for someone.
I am trying to make an addon to a game named Tibia.
On their website Tibia.com you can search up people and see their deaths.
forexample:
http://www.tibia.com/community/?subtopic=characters&name=Kixus
Now I want to read the deaths data by using Regex in my C# application.
But I cannot seem to work it out, I've been spending hours and hours on
http://myregextester.com/index.php
The expression I use is :
<tr bgcolor=(?:"#D4C0A1"|"#F1E0C6") ><td width="25%" valign="top" >(.*?)?#160;CET</td><td>((?:Died|Killed) at Level ([^ ]*)|and) by (?:<[^>]*>)?([^<]*).</td></tr>
But I cannot make it work.
I want the Timestamp, creature / player Level, and creature / player name
Thanks in advance.
-Regards
It's a bad idea to use regular expressions to parse HTML. They're a very poor tool for the job. If you're parsing HTML, use an HTML parser.
For .NET, the usual recommendation is to use the HTML Agility Pack.
As suggested by Joe White, you would have a much more robust implementation if you use an HTML parser for this task. There is plenty of support for this on StackOverflow: see here for example.
If you really have to use regexs
I would recommend breaking your solution down into simpler regexs which can be applied using a top down parsing approach to get the results.
For example:
use a regex on the whole page which matches the character table
I would suggest matching the shortest unique string before and after the table rather than the table itself, and capturing the table using a group, since this avoids having to deal with the possibility of nested tables.
use a regex on the character table that matches table rows
use a regex on the first cell to match the date
use a regex on the second cell to match links
use a regex on the second cell to match the players level
use a regex on the second cell to match the killers name if it was a creature (there are no links in the cell)
This will be much more maintainable if the site changes its Html structure significantly.
A complete working implementation using HtmlAgilityKit
You can dowload the library from the HtmlAgilityKit site on CodePlex.
// This class is used to represent the extracted details
public class DeathDetails
{
public DeathDetails()
{
this.KilledBy = new List<string>();
}
public string DeathDate { get; set; }
public List<String> KilledBy { get; set; }
public int PlayerLevel { get; set; }
}
public class CharacterPageParser
{
public string CharacterName { get; private set; }
public CharacterPageParser(string characterName)
{
this.CharacterName = characterName;
}
public List<DeathDetails> GetDetails()
{
string url = "http://www.tibia.com/community/?subtopic=characters&name=" + this.CharacterName;
string content = GetContent(url);
HtmlDocument document = new HtmlDocument();
document.LoadHtml(content);
HtmlNodeCollection tables = document.DocumentNode.SelectNodes("//div[#id='characters']//table");
HtmlNode table = GetCharacterDeathsTable(tables);
List<DeathDetails> deaths = new List<DeathDetails>();
for (int i = 1; i < table.ChildNodes.Count; i++)
{
DeathDetails details = BuildDeathDetails(table, i);
deaths.Add(details);
}
return deaths;
}
private static string GetContent(string url)
{
using (System.Net.WebClient c = new System.Net.WebClient())
{
string content = c.DownloadString(url);
return content;
}
}
private static DeathDetails BuildDeathDetails(HtmlNode table, int i)
{
DeathDetails details = new DeathDetails();
HtmlNode tableRow = table.ChildNodes[i];
//every row should have two cells in it
if (tableRow.ChildNodes.Count != 2)
{
throw new Exception("Html format may have changed");
}
HtmlNode deathDateCell = tableRow.ChildNodes[0];
details.DeathDate = System.Net.WebUtility.HtmlDecode(deathDateCell.InnerText);
HtmlNode deathDetailsCell = tableRow.ChildNodes[1];
// get inner text to parse for player level and or creature name
string deathDetails = System.Net.WebUtility.HtmlDecode(deathDetailsCell.InnerText);
// get player level using regex
Match playerLevelMatch = Regex.Match(deathDetails, #" level ([\d]+) ", RegexOptions.IgnoreCase);
int playerLevel = 0;
if (int.TryParse(playerLevelMatch.Groups[1].Value, out playerLevel))
{
details.PlayerLevel = playerLevel;
}
if (deathDetailsCell.ChildNodes.Count > 1)
{
// death details contains links which we can parse for character names
foreach (HtmlNode link in deathDetailsCell.ChildNodes)
{
if (link.OriginalName == "a")
{
string characterName = System.Net.WebUtility.HtmlDecode(link.InnerText);
details.KilledBy.Add(characterName);
}
}
}
else
{
// player was killed by a creature - capture creature name
Match creatureMatch = Regex.Match(deathDetails, " by (.*)", RegexOptions.IgnoreCase);
string creatureName = creatureMatch.Groups[1].Value;
details.KilledBy.Add(creatureName);
}
return details;
}
private static HtmlNode GetCharacterDeathsTable(HtmlNodeCollection tables)
{
foreach (HtmlNode table in tables)
{
// Get first row
HtmlNode tableRow = table.ChildNodes[0];
// check to see if contains enough elements
if (tableRow.ChildNodes.Count == 1)
{
HtmlNode tableCell = tableRow.ChildNodes[0];
string title = tableCell.InnerText;
// skip this table if it doesn't have the right title
if (title == "Character Deaths")
{
return table;
}
}
}
return null;
}
And an example of it in use:
CharacterPageParser kixusParser = new CharacterPageParser("Kixus");
foreach (DeathDetails details in kixusParser.GetDetails())
{
Console.WriteLine("Player at level {0} was killed on {1} by {2}", details.PlayerLevel, details.DeathDate, string.Join(",", details.KilledBy));
}
You can also use Espresso tool to work out proper regular expression.
To properly escape all special characters that are not parts of regular expression you can use Regex.Escape method:
string escapedText = Regex.Escape("<td width=\"25%\" valign=\"top\" >");
try this :
http://jsbin.com/atupok/edit#javascript,html
and continue from there .... I did the most job here :)
edit
http://jsbin.com/atupok/3/edit
and start using this tool
http://regexr.com?2vrmf
not the one you have.