Extracting values from a string in C#

Extracting values from a string in C# - c#

I have the following string which i would like to retrieve some values from:
============================
Control 127232:
map #;-
============================
Control 127235:
map $;NULL
============================
Control 127236:
I want to take only the Control . Hence is there a way to retrieve from that string above into an array containing like [127232, 127235, 127236]?

One way of achieving this is with regular expressions, which does introduce some complexity but will give the answer you want with a little LINQ for good measure.
Start with a regular expression to capture, within a group, the data you want:
var regex = new Regex(#"Control\s+(\d+):");
This will look for the literal string "Control" followed by one or more whitespace characters, followed by one or more numbers (within a capture group) followed by a literal string ":".
Then capture matches from your input using the regular expression defined above:
var matches = regex.Matches(inputString);
Then, using a bit of LINQ you can turn this to an array
var arr = matches.OfType<Match>()
.Select(m => long.Parse(m.Groups[1].Value))
.ToArray();
now arr is an array of long's containing just the numbers.
Live example here: http://rextester.com/rundotnet?code=ZCMH97137

try this (assuming your string is named s and each line is made with \n):
List<string> ret = new List<string>();
foreach (string t in s.Split('\n').Where(p => p.StartsWith("Control")))
ret.Add(t.Replace("Control ", "").Replace(":", ""));
ret.Add(...) part is not elegant, but works...
EDITED:
If you want an array use string[] arr = ret.ToArray();
SYNOPSYS:
I see you're really a newbie, so I try to explain:
s.Split('\n') creates a string[] (every line in your string)
.Where(...) part extracts from the array only strings starting with Control
foreach part navigates through returned array taking one string at a time
t.Replace(..) cuts unwanted string out
ret.Add(...) finally adds searched items into returning list

Off the top of my head try this (it's quick and dirty), assuming the text you want to search is in the variable 'text':
List<string> numbers = System.Text.RegularExpressions.Regex.Split(text, "[^\\d+]").ToList();
numbers.RemoveAll(item => item == "");
The first line splits out all the numbers into separate items in a list, it also splits out lots of empty strings, the second line removes the empty strings leaving you with a list of the three numbers. if you want to convert that back to an array just add the following line to the end:
var numberArray = numbers.ToArray();

Yes, the way exists. I can't recall a simple way for It, but string is to be parsed for extracting this values. Algorithm of it is next:
Find a word "Control" in string and its end
Find a group of digits after the word
Extract number by int.parse or TryParse
If not the end of the string - goto to step one
realizing of this algorithm is almost primitive..)
This is simplest implementation (your string is str):
int i, number, index = 0;
while ((index = str.IndexOf(':', index)) != -1)
{
i = index - 1;
while (i >= 0 && char.IsDigit(str[i])) i--;
if (++i < index)
{
number = int.Parse(str.Substring(i, index - i));
Console.WriteLine("Number: " + number);
}
index ++;
}
Using LINQ for such a little operation is doubtful.

Related

how to find text in a string in c#

I am learning Dotnet c# on my own.
how to find whether a given text exists or not in a string and if exists, how to find count of times the word has got repeated in that string. even if the word is misspelled, how to find it and print that the word is misspelled?
we can do this with collections or linq in c# but here i used string class and used contains method but iam struck after that.
if we can do this with help of linq, how?
because linq works with collections, Right?
you need a list in order to play with linq.
but here we are playing with string(paragraph).
how linq can be used find a word in paragraph?
kindly help.
here is what i have tried so far.
string str = "Education is a ray of light in the darkness. It certainly is a hope for a good life. Eudcation is a basic right of every Human on this Planet. To deny this right is evil. Uneducated youth is the worst thing for Humanity. Above all, the governments of all countries must ensure to spread Education";
for(int i = 0; i < i++)
if (str.Contains("Education") == true)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}

You can make a string a string[] by splitting it by a character/string. Then you can use LINQ:
if(str.Split().Contains("makes"))
{
// note that the default Split without arguments also includes tabs and new-lines
}
If you don't care whether it is a word or just a sub-string, you can use str.Contains("makes") directly.
If you want to compare in a case insensitive way, use the overload of Contains:
if(str.Split().Contains("makes", StringComparer.InvariantCultureIgnoreCase)){}

string str = "money makes many makes things";
var strArray = str.Split(" ");
var count = strArray.Count(x => x == "makes");

the simplest way is to use Split extension to split the string into an array of words.
here is an example :
var words = str.Split(' ');
if(words.Length > 0)
{
foreach(var word in words)
{
if(word.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}
}
}
Now, since you just want the count of number word occurrences, you can use LINQ to do that in a single line like this :
var totalOccurrences = str.Split(' ').Count(x=> x.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1);
Note that StringComparison.InvariantCultureIgnoreCase is required if you want a case-insensitive comparison.

C# Removing Last Element of a string Array?

I am parsing a set of coordinates from an XML file. Each node will have coordinates like:
-82.5,34.1,0.000 -82.6,34.2,0.000
In the code below, the coords_raw variable is already assigned the above value and I am trying to split into array lnglatset --which does look okay.
string[] lnglatset = raw_coords.Split(' ');//will yield like [0]=-82.00,34.00,00000 // Will need to get rid of the last set of zeros
foreach (string lnglat in lnglatset)
{
Console.WriteLine(lnglat);//-82.5,34.1,0.000; looks fine
}
From the above, the final value needed would be:
coords = "-82.5 34.1, -82.6 34.2";//note the space between lng/lat
But how do remove the junk values of 0.000 from each element of the array and put a space, instead of a comma between the lng and lat values in each element? I have tried some remove() function on lnglat but that was not allowed within the foreach loop. Thanks!

You can take all parts except the last one using Take method:
var parts = raw_coords.Split(' ')
.Select(x => x.Split(','))
.Select(x => string.Join(" ", x.Take(x.Length - 1)));
var result = string.Join(",", parts);

In a single line :
String result = String.Join(" ", raw_coords.Split(' ', ',')
.Select(i => double.Parse(i))
.Where(i => i != 0).Select( i => i.ToString()));
it removes each 0.000 element and removes the space and the comma.

You can't alter the members of IEnumerable during a ForEach. Instead, you can just skip the last member when splitting the raw coordinate input.
raw_coords.split(' ').Take(2).ToArray()

Like others mentioned, you can not modify iterating variable with foreach. I learnt it the hard way and ended up using simple "for" loop instead of foreach:
for(int index=0; i<lnglatset.length-2; i++)
{
}

You can use IEnumerable.Last() extension method from System.Linq.
string lastItemOfSplit = aString.Split(new char[] {#"\"[0], "/"[0]}).Last();

auto detect tag within a text

Does there is any library or algorithm that can do auto detection of tags in a text (ignoring the usual words of the chosen language)?
Something like this:
string[] keywords = GetKeyword("Your order is num #0123456789")
and keywords[] would contain "order" and "#0123456789" ...?
Does it exist? Or the user will select by himself all the tags of every document all the time? :?

foreach(string keyword in keywords) { // where keywords is a List<string>
if ("Your order is num #0123456789".Contains(keyword)) {
keywordsPresent.Add(keyword); // where keywordsPresent is a List<string>
}
}
return keywordsPresent;
What the above does is not cater for your #0123456789, for that add some more logic to find the index of the # or something...

Sorry, I misunderstood the question. If you want to look for specific words, the algorithm will depend on you strings. For example, you can use string.Split() to generate an array of words from one string, and then work with that, like this:
string[] words = string.Split("Your order is num #0123456789");
string orderNumber = "";
if(words.Contains("order") && w.StartsWith("#").Count > 0)
{
orderNumber = words.Where(w=>w.StartsWith("#").FirstOrDefault();
}
This will first generate an array of words from "Your order is num #0123456789" , then if it contains the word "order" it will wind a word that starts with "#" and select that;

I think that a lot of different algorithms can be used. Some of them are simple another are super complex. I can suggest you the next basic way:
Split all text into array of words.
Remove stop words from the array. (Goole "stop words list" to get full list of stop words.)
Walk through the array and calculate count of each word.
Sort words in accordance with their 'weight' in the array.
Choose necessary amount of tags.

Split string into array then loop, in C#

I have Googled this a LOT but my C# skills are pretty terrible and I just can't see why this isn't working.
I have a string which comes from a session object, which I don't have any control over setting. The string contains some sentences separated by six underscores. e.g.:
Sentence number one______Sentence number two______Sentence number three etc
I want to split this string by the six underscores and return each item in the resultant array.
Here's the code I have:
string itemsPlanner = HttpContext.Current.Session["itemsPlanner"].ToString();
string[] arrItemsPlanner = itemsPlanner.Split(new string[] { "______" }, StringSplitOptions.None);
foreach (string i in arrItemsPlanner)
{
newItemsPlanner += "debug1: " + i; //This returns what looks like a number, as I'd expect, starting at zero and iterating by one each loop.
int itemNumber;
try
{
itemNumber = Convert.ToInt32(i);
string sentence = arrItemsPlanner[itemNumber].ToString();
}
catch (FormatException e)
{
return "Input string is not a sequence of digits.";
}
catch (OverflowException e)
{
return "The number cannot fit in an Int32.";
}
finally
{
return "Fail!"
}
}
Whenever I run this, the session is being retreived successfully but the line which says: itemNumber = Convert.ToInt32(i); fails every time and I get an error saying "Input string is not a sequence of digits."
Can anyone point me in the right direction with this please?
Many thanks!

If you just want to get each sentence and do something with it, this will do the trick:
string itemsPlanner = HttpContext.Current.Session["itemsPlanner"].ToString();
string[] arrItemsPlanner = itemsPlanner.Split("______");
foreach (string i in arrItemsPlanner)
{
// Do something with each sentence
}
You can split over a string as well as char (or char[]). In the foreach 'i' will be the value of the sentence, so you can concatenate it or process it or do whatever :)
If I've misunderstood, my apologies. I hope that helps :)

in your case i is not a number, it's the actual element in the array. A foreach loop has no iteration variable, you only have access to the actual element being iterated through i.
So first loop itareation i is Sentence number one, then Sentence number two.
If you want the number, you have to use a for loop instead.
So something like this
for( int i = 0; i < arrItemsPlanner.length; i++ ){
//on first iteration here
//i is 0
//and arrItemsPlanner[i] id "Sentence number one"
}
Hope it helps.

From your example i does not contain a valid integer number, thus Convert.ToInt32 fails. The foreach loop sets i with the current item in the sentences array, so basically i always contains one of the sentences in your main string. If you want i to be the index in the array, use a for loop.

Example from MSDN.
string words = "This is a list of words______with a bit of punctuation" +
"______a tab character.";
string [] split = words.Split(new Char [] {'_'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in split) {
if (s.Trim() != "")
Console.WriteLine(s);
}

Do you need to trim your string before converting to a number? if thats not you may want to use Int32.tryParse()

In your sample code foreach (string i in arrItemsPlanner) 'i' will get the string value of arrItemsPlanner one by one.
For exmaple on first iteration it will have 'Sentence number one' which is obviously not a vlid ont, hence your conversion failed.

i only contains one of the string fragment which will be : number one Sentence number two and Sentence number three. If you want it to contain an int representing ht index, use :
1) a for loop
2) an int defined before your foreach and increase it (myInt++) in the foreach code !

Regex to extract info out of large html source?

in among lots of html source i have some elements like this
<option value=15>Bahrain - Manama</option>
<option value=73>Bangladesh - Dhaka</option>
<option value=46>Barbados - Bridgetown</option>
<option value=285>Belarus - Minsk</option>
<option value=48>Belgium - Brussels</option>
<option value=36>Belize - Belmopan</option>
Also I have a dictionary declared like Dictionary<string, int> Places = new Dictionary<string, int>();
What I want to do it extract the City name out of the html and put it into of Places, and extract the number code out and put it into the int. For the first one I would add Placed.Add("Manama", 15); The country name can get ignored. The idea though is to scan the html source and add the Cities automatically.
this is what I have so far
string[] temp = htmlContent.Split('\n');
List<string> temp2 = new List<string>();
foreach (string s in temp)
{
if (s.Contains("<option value="))
{
string t = s.Replace("option value=", "");
temp2.Add(t);
}
}
This cuts out some of the text but then I more or less get stuck wondering how to extract the relevant parts from the text. It's really bad I know but I'm learning :(

Don't use a regular expression - use HtmlAgilityPack - now you can use Linq to retrieve your option elements and build up your dictionary in a one-liner:
HtmlDocument doc = new HtmlDocument();
//remove "option" special handling otherwise inner text won't be parsed correctly
HtmlNode.ElementsFlags.Remove("option");
doc.Load("test.html");
var Places = doc.DocumentNode
.Descendants("option")
.ToDictionary(x => x.InnerText.Split('-')[1].Trim(),
x => x.Attributes["value"].Value);
For extracting the city name from the option value the above uses string.Split(), splitting on the separating -, taking the second (city) string and trimming any leading or trailing whitespace.

If the only relevant data you are looking for is within
string[] options = Regex.Split(theSource, "<option value="); // Splits up the source which is downloaded from the url
that way you are instantly faced with an array of strings with the first few chars being your int. if the ints are always over 10, i.e 2 characters long, you can use:
int y = 2; // pointer
string theString = options[x].substring(0,2); // if the numbers are always > 10 its quicker than a loop otherwise leave this bit out and loop the is below
if(options[x].substring(y,1)!=">") // check to see if the number has finished
{
theString += options[x].substring(y,1);
y++;
}
int theInt = int.Parse(theString);
to get the number you can loop the if statement with a pointer if you need to get longer numbers. If the numbers are not always over 10, just loop the if statement with a pointer and ignore the first lines.
Then I would re-use the string theString:
string[] place = Regex.Split(options[x], " - "); // split it immediately after the name
theString = place[0].substring(y, place[0].length - y);
And then add them with
Places.Add(theString, theInt);
Shoud work, if the code doesnt work straigth away, the algorithms will, just make sure the spelling is right and that the variables are doing what they should

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting values from a string in C# - c#

Related

how to find text in a string in c#

C# Removing Last Element of a string Array?

auto detect tag within a text

Split string into array then loop, in C#

Regex to extract info out of large html source?

Categories

Resources