I need to load a long string from the internet and I have done that. Now I need to find the H1 header tag and print the contents.
What is the shortest or the easiest way to do that?
for (int x = 0; x < tempString.Length; x++)
{
if (write == 2)
{
name =name + tempString[x];
lenght++;
}
if (tempString[x] == '<' && tempString[x] == 'h' && tempString[x] == '1' )
write = 1;
if (write == 1 && tempString[x] == '>')
write = 2;
if (tempString[x] == '-' && write == 1)
write = 0;
}
I know it's a bit how shall I say odd. But it's all I have.
Use the HTML Agility Pack - Pretty much anything else you try is just going to cause you a headache.
HtmlAgility sample:
var html = "<html><head></head><body><h1>hello</h1></body></html>";
HtmlDocument d = new HtmlDocument();
d.LoadHtml(html);
var h1Contents = d.DocumentNode.SelectSingleNode("//h1").InnerText;
If you want to do it in flat C#, and you're only looking at 1 tag:
int first_tag = str.IndexOf("<H1>");
int last_tag = str.IndexOf("</H1>");
string text = str.SubString((first_tag + 4), (last_tag - first_tag));
Use an HTML library!
Otherwise try:
String.IndexOf(String x )
http://msdn.microsoft.com/en-us/library/k8b1470s.aspx
you can use that to get the first index of the start and end tags. you can then just read between those indices.
The System.String class has methods like IndexOf(String) - Reports the zero-based index of the first occurence of the specified string.
So in your case, you could pass in "<H1>". Then you could get a substring starting at that point, and then call this method again looking for "</H1>" again.
Or if you want, it might be easier to use Regular Expressions in .NET. Those are found in the System.Tet.RegularExpressions namespace. Those are definitely more complicated. But I'm sure you could practice using some small samples and learn the power of the dark side! (errr....) the power of regular expressions! :)
[edit] Now that I see other's answers, I definitely agree with others. If you need do anything more complicated than getting one item in an HTML formatted string use an html parser.
all of the above work fine i just can't use any external libaries
this works well for me
for (int x = 0; x < tempString.Length; x++)
{
if (tempString[x] == '-' && write == 2)
{ write = 0; }
if (write == 2)
{
title =title + tempString[x];
lenght++;
}
if (tempString[x] == '<' && tempString[x+1] == 'h' && tempString[x+2] == '1' )
{ write = 1; }
if (write == 1 && tempString[x] == '>')
{ write = 2; }
}
Related
I have browsed a view related articles, but haven't quite found a solution that fits my query.
In a large plain-text File (~150MB, ~1.800.000 lines) I quickly want to find specific lines that have certain features using C#.
Each line has 132 characters, every one has a region-, a section-, a subsection code and an ident.
The combination of these 4 characteristics is unique.
Depending on the section code, the exact location of the other parts may differ.
Essentially, I want to retrieve up to ~50 elements with one method, that ideally takes less than a second.
The code I have so far works, but is way to slow for my purposes (~29 seconds of execution for 30 entries):
//icaoCode is always 2 char long
public static List<Waypoint> Retrieve(List<(string ident, string icaoCode, char sectionCode, char subSectionCode)> wpData)
{
List<Waypoint> result = new List<Waypoint>();
using StreamReader reader = new StreamReader(dataFile);
while (!reader.EndOfStream)
{
string data = reader.ReadLine();
if (data.Length != 132) continue;
foreach(var x in wpData)
{
int subsPos = (x.sectionCode, x.subSectionCode) switch
{
('P', 'N') => 5,
('P', _) => 12,
(_, _) => 5
};
if (data[4].Equals(x.sectionCode) && data[subsPos].Equals(x.subSectionCode))
{
//IsNdb() and others look at the sectionCode and subSectionCode to determine data type
if (IsNdb(data) && data[13..17].Trim() == x.ident && data[19..21] == x.icaoCode) result.Add(ArincHelper.LoadNdbEntry(data));
else if (IsVhf(data) && data[13..17].Trim() == x.ident && data[19..21] == x.icaoCode) result.Add(ArincHelper.LoadVhfEntry(data));
else if (IsTacan(data) && data[13..17].Trim() == x.ident) result.Add(ArincHelper.LoadTacanEntry(data));
else if (IsIls(data) && data[13..17].Trim() == x.ident && data[10..12] == x.icaoCode) result.Add(ArincHelper.LoadIlsEntry(data));
else if (IsAirport(data) && data[6..10] == x.ident && data[10..12] == x.icaoCode) result.Add(ArincHelper.LoadAirportEntry(data));
else if (IsRunway(data) && (data[6..10] + data[15..18].Trim()) == x.ident && data[10..12] == x.icaoCode) result.Add(ArincHelper.LoadRunwayEntryAsWaypoint(data));
else if (IsWaypoint(data) && data[13..18].Trim() == x.ident && data[19..21] == x.icaoCode) result.Add(ArincHelper.LoadWaypointEntry(data));
}
}
}
reader.Close();
return result;
}
IsNdb() and the other identifying Methods all look like this:
private static bool IsNdb(string data) => (data[4], data[5]) == ('D', 'B') || (data[4], data[5]) == ('P', 'N');
Some Example data lines would be:
SEURPNEBBREB OP EB004020HOM N50561940E004353360 E0010 WGEBRUSSELS 169641609
SEURP EDDFEDAFRA 0FL100131Y N50015990E008341364E002300364250FFM ED05000 MWGE FRANKFURT/MAIN 331502006
SEURD CHA ED011535VDHB N49551597E009022334CHA N49551597E009022334E0020005292 249WGECHARLIE 867432005
SEURP LFFKLFCFK404 LF0 W F N46262560W000480430 E0000 WGE FK404 331071909
I would like to avoid loading the whole file into memory, as this takes ~400MB of RAM, although it is possible of course.
Thank you in advance for your help.
Edit:
The current solution converts this data file into an SQLite DB, which is then used.
This however takes ~3h of converting the file into the DB, which I want to avoid, as the data file is regularily swapped out.
This is why I would like to give this text parsing a try.
As #mjwills suggest there are better suited tools for the job, I would keep this data in a database. If I was going to try and make your current code faster I would try the following. I would read chunks of the data into an array, and process that part of the in a parallel for loop, and exit the loop when you have enough elements. Below is some pseudo code to get you started. I can't write complete code because I don't have you object/file.
List<Waypoint> result = new List<Waypoint>();
var max = 1800000; //set the to the max rows in your file
var allLines = new string[max];
var dataFile = "";
using (StreamReader sr = File.OpenText(dataFile))
{
int x = 0;
while (!sr.EndOfStream)
{
allLines[x] = sr.ReadLine();
x += 1;
if (x % 5000 == 0)
{
var i = x - 5000;
Parallel.For(i, allLines.Length, x =>
{
//do your process here and exit if you have enough elements also set
//a flag to exit the while loop
});
}
//you would have to write some code to handle the last group of records that are less than 5k
}
}
You could use Gigantor for this which should be many times faster. Gigantor is a c# library for doing fast regex searching of gigantic files. It is available as either source code or nuget package.
Gigantor's search benchmark searches 5 GBytes in about 3 seconds finding a total of 105160 matches.
You would just need to convert your parsing code to a regular expression instead.
I'm trying to get away with a slick one liner as I feel it is probably possible.
I'll put my code below and then try to explain a little more what I'm trying to achieve.
for (int p = 0; p < 2; p++)
{
foreach (string player in players[p])
{
if (PlayerSkills[player].streak_count *>* 0) //This line
PlayerSkills[player].streak_count++;
else
PlayerSkills[player].streak_count = 0;
}
}
*(p==0 ? >:<) the comparison operator is chosen depending on p.
Of course what I've written is rubbish. But basically I want to use >0 when p==0, and <0 when p>>0. Is there a nice way to achieve this?
Well, you should use what is most readable, even if it is not as consice. That said...
// Invert the count for all but the first player and check for a positive number
if (PlayerSkills[player].streak_count * (p==0 ? 1 : -1) > 0)
I don't know about slick, but the following and/or combination is one line:
if ((p == 0 && PlayerSkills[player].streak_count > 0)
|| PlayerSkills[player].streak_count < 0)
...
This will only ever do the array index once (due to the p==0 condition occurring first) and so is equivalent to the "ternary" you wrote (albeit a bit more verbose).
p > 0 ? whenGreaterThanZero : whenZeroOrLess ;
E.g.
int p = 1; bool test = p > 0 ? true : false ;
Lets test = True
In my SQL Server database I have strings stored representing the correct solution to a question. In this string a certain format can be used to represent multiple correct solutions. The format:
possible-text [posibility-1/posibility-2] possible-text
This states either posibility-1 or posibility-2 is correct. There is no limit on how many possibilities there are (e.g. [ pos-1 / pos-2 / pos-3 / ... ] is possible).
However, a possibility can be null, e.g.:
I am [un/]certain.
This means the answer could be "I am certain" or "I am uncertain".
The format can also be nested in a sentence, e.g.:
I am [[un/]certain/[un/]sure].
The format can also occur multiple times in one sentence, e.g.:
[I am/I'm] [[un/]certain/[/un]sure].
What I want is to generate all the possible combinations. E.g. the above expression should return:
I am uncertain.
I am certain.
I am sure.
I am unsure.
I'm uncertain.
I'm certain.
I'm sure.
I'm unsure.
There is no limit on the nesting, nor the amount of possibilities. If there is only one possible solution then it will have not be in the above format. I'm not sure on how to do this.
I have to write this in C#. I think a possible solution could be to write a regex expression that can capture the [ / ] format and return me the possible solutions in a list (for every []-pair) and then generating the possible solutions by going over them in a stack-style way (some sort of recursion and backtracking style), but I'm not to a working solution yet.
I'm at a loss on to how exactly start on this. If somebody could give me some pointers on how to tackle this problem I'd appreciate it. When I find something I'll add it here.
Note: I noticed there are a lot of similar questions, however the solutions all seem to be specific to the particular problem and I think not applicable to my problem. If however I'm wrong, and you remember a previously answered question that can solve this, could you then tell me? Thanks in advance.
Update: Just to clarify if it was unclear. Every line in code is possible input. So this whole line is input:
[I am/I'm] [[un/]certain/[/un]sure].
This should work. I didn't bother optimizing it or doing error checking (in case the input string is malformed).
class Program
{
static IEnumerable<string> Parts(string input, out int i)
{
var list = new List<string>();
int level = 1, start = 1;
i = 1;
for (; i < input.Length && level > 0; i++)
{
if (input[i] == '[')
level++;
else if (input[i] == ']')
level--;
if (input[i] == '/' && level == 1 || input[i] == ']' && level == 0)
{
if (start == i)
list.Add(string.Empty);
else
list.Add(input.Substring(start, i - start));
start = i + 1;
}
}
return list;
}
static IEnumerable<string> Combinations(string input, string current = "")
{
if (input == string.Empty)
{
if (current.Contains('['))
return Combinations(current, string.Empty);
return new List<string> { current };
}
else if (input[0] == '[')
{
int end;
var parts = Parts(input, out end);
return parts.SelectMany(x => Combinations(input.Substring(end, input.Length - end), current + x)).ToList();
}
else
return Combinations(input.Substring(1, input.Length - 1), current + input[0]);
}
static void Main(string[] args)
{
string s = "[I am/I'm] [[un/]certain/[/un]sure].";
var list = Combinations(s);
}
}
You should create a parser that read character by character and builds up a logical tree of the sentence. When you have the tree it is easy to generate all possible combinations. There are several lexical parsers available that you could use, for example ANTLR: http://programming-pages.com/2012/06/28/antlr-with-c-a-simple-grammar/
i'd like to know how to read and parse specific integer value form a text file and add it to listbox in c#. For example I have a text file MyText.txt like this:
<>
101
192
-
399
~
99
128
-
366
~
101
192
-
403
~
And I want to parse the integer value between '-' and '~' and add each one of it to items in list box for example:
#listBox1
399
366
403
Notice that each line of value separated by Carriage Return and Line Feed. And by the way, it is a data transmitted through RS-232 Serial Communication from microcontroller. Sorry, I'm just new in c# programming. Thanks in advance.
Here's a way to do it with LINQ:
bool keep = false;
listBox1.Items.AddRange(
File.ReadLines("MyText.txt")
.Where(l =>
{
if (l == "-") keep = true;
else if (l == "~") keep = false;
else return keep;
return false;
})
.ToArray());
you could use regular expressions like so:
var s = System.Text.RegularExpressions.Regex.Matches(stringtomatch,#"(?<=-\s*)[0-9]+\b(?=\s*~)");
The regex basically looks for a number. It then checks the characters behind, looks for an optional whitespace and a dash (-). then it matches all the numbers until it encounters another non-word character. it checks for an optional whitespace and then a required ~ (dunno what that's called). Also, it only returns the number (not the whitespace and symbols).
So basically this method returns a list of matches. you could then use it like so:
for (int i = 0; i < s.Count; i++)
{
listBox1.Items.Add(s[i]);
}
EDIT:
typo in the regex and updated the loop (for some reason, foreach doesn't work with the MatchCollection).
you can try running this test script:
var stringtomatch = " asdjasdk jh kjh asd\n-\n123123\n~\nasdasd";
var s = System.Text.RegularExpressions.Regex.Matches(stringtomatch,#"(?<=-\s*)[0-9]+\b(?=\s*~)");
Console.WriteLine(stringtomatch);
for (int i = 0; i < s.Count; i++)
{
listBox1.Items.Add(s[i]);
}
Try
List<Int32> values = new List<Int32>();
bool open = false;
String[] lines = File.ReadAllLines(fileName);
foreach(String line in lines)
{
if( (!open) && (line == "-") )
{
open = true;
}
else if( (open) && (line == "~") )
{
open = false;
}
else if(open)
{
Int32 v;
if(Int32.TryParse(line, out v))
{
values.Add(v);
}
}
}
Listbox.Items.AddRange(values);
This is a easy piece of code with reading a file, converting to integer (although you could stay with strings) and handling lists. You should start with some basic .NET/C# tutorials.
Edit: To add the values to the listbox you can switch to values.ForEach(v => listbox.Items.Add(v.ToString()) if you use .NET 3.5. Otherwise make a foreach yourself.
Sorry for such a basic question regarding lists, but do we have this feature in C#?
e.g. imagine this Python List:
a = ['a','b,'c']
print a[0:1]
>>>>['a','b']
Is there something like this in C#? I currently have the necessity to test some object properties in pairs. edit: pairs are always of two :P
Imagine a larger (python) list:
a = ['a','a','b','c','d','d']
I need to test for example if a[0] = a[1], and if a[1] = a[2] etc.
How this can be done in C#?
Oh, and a last question: what is the tag (here) i can use to mark some parts of my post as code?
You can use LINQ to create a lazily-evaluated copy of a segment of a list. What you can't do without extra code (as far as I'm aware) is take a "view" on an arbitrary IList<T>. There's no particular reason why this shouldn't be feasible, however. You'd probably want it to be a fixed size (i.e. prohibit changes via Add/Remove) and you could also make it optionally read-only - basically you'd just proxy various calls on to the original list.
Sounds like it might be quite useful, and pretty easy to code... let me know if you'd like me to do this.
Out of interest, does a Python slice genuinely represent a view, or does it take a copy? If you change the contents of the original list later, does that change the contents of the slice? If you really want a copy, the the LINQ solutions using Skip/Take/ToList are absolutely fine. I do like the idea of a cheap view onto a collection though...
I've been looking for something like Python-Slicing in C# with no luck.
I finally wrote the following string extensions to mimic the python slicing:
static class StringExtensions
{
public static string Slice(this string input, string option)
{
var opts = option.Trim().Split(':').Select(s => s.Length > 0 ? (int?)int.Parse(s) : null).ToArray();
if (opts.Length == 1)
return input[opts[0].Value].ToString(); // only one index
if (opts.Length == 2)
return Slice(input, opts[0], opts[1], 1); // start and end
if (opts.Length == 3)
return Slice(input, opts[0], opts[1], opts[2]); // start, end and step
throw new NotImplementedException();
}
public static string Slice(this string input, int? start, int? end, int? step)
{
int len = input.Length;
if (!step.HasValue)
step = 1;
if (!start.HasValue)
start = (step.Value > 0) ? 0 : len-1;
else if (start < 0)
start += len;
if (!end.HasValue)
end = (step.Value > 0) ? len : -1;
else if (end < 0)
end += len;
string s = "";
if (step < 0)
for (int i = start.Value; i > end.Value && i >= 0; i+=step.Value)
s += input[i];
else
for (int i = start.Value; i < end.Value && i < len; i+=step.Value)
s += input[i];
return s;
}
}
Examples of how to use it:
"Hello".Slice("::-1"); // returns "olleH"
"Hello".Slice("2:-1"); // returns "ll"