Using LINQ to parse the numbers from a string - c#

Is it possible to write a query where we get all those characters that could be parsed into int from any given string?
For example we have a string like: "$%^DDFG 6 7 23 1"
Result must be "67231"
And even slight harder: Can we get only first three numbers?

This will give you your string
string result = new String("y0urstr1ngW1thNumb3rs".
Where(x => Char.IsDigit(x)).ToArray());
And for the first 3 chars use .Take(3) before ToArray()

The following should work.
var myString = "$%^DDFG 6 7 23 1";
//note that this is still an IEnumerable object and will need
// conversion to int, or whatever type you want.
var myNumber = myString.Where(a=>char.IsNumber(a)).Take(3);
It's not clear if you want 23 to be considered a single number sequence, or 2 distinct numbers. My solution above assumes you want the final result to be 672

public static string DigitsOnly(string strRawData)
{
return Regex.Replace(strRawData, "[^0-9]", "");
}

string testString = "$%^DDFG 6 7 23 1";
string cleaned = new string(testString.ToCharArray()
.Where(c => char.IsNumber(c)).Take(3).ToArray());
If you want to use a white list (not always numbers):
char[] acceptedChars = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
string cleaned = new string(testString.ToCharArray()
.Where(c => acceptedChars.Contains(c)).Take(3).ToArray());

How about something like this?
var yourstring = "$%^DDFG 6 7 23 1";
var selected = yourstring.ToCharArray().Where(c=> c >= '0' && c <= '9').Take(3);
var reduced = yourstring.Where(char.IsDigit).Take(3);

Regex:
private int ParseInput(string input)
{
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(#"\d+");
string valueString = string.Empty;
foreach (System.Text.RegularExpressions.Match match in r.Matches(input))
valueString += match.Value;
return Convert.ToInt32(valueString);
}
And even slight harder: Can we get
only first three numbers?
private static int ParseInput(string input, int take)
{
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(#"\d+");
string valueString = string.Empty;
foreach (System.Text.RegularExpressions.Match match in r.Matches(input))
valueString += match.Value;
valueString = valueString.Substring(0, Math.Min(valueString.Length, take));
return Convert.ToInt32(valueString);
}

> 'string strRawData="12#$%33fgrt$%$5";
> string[] arr=Regex.Split(strRawData,"[^0-9]"); int a1 = 0;
> foreach (string value in arr) { Console.WriteLine("line no."+a1+" ="+value); a1++; }'
Output:line no.0 =12
line no.1 =
line no.2 =
line no.3 =33
line no.4 =
line no.5 =
line no.6 =
line no.7 =
line no.8 =
line no.9 =
line no.10 =5
Press any key to continue . . .

Related

Regex replace all occurences with something that is "derived" from the part to be replaced

I have the following line from a RTF document
10 \u8314?\u8805? 0
(which says in clear text 10 ⁺≥ 0). You can see that the special characters are escaped with \u followed by the decimal unicode and by a question mark (which is the replacement character which should be printed in the case that displaying the special character is not possible). I want to have the text in a string variable in C# which is equivalent to the following variable:
string expected = "10 \u207A\u2265 0";
In the debugger I want to see the variable to have the value of 10 ⁺≥ 0. I therefore must replace every occurence by the corresponding hexadecimal unicode (#207A = 8314 and #2265 = 8805). What is the simplest way to accomplish this with regular expressions?
The code is:
string str = #"10 \u8314?\u8805? 0";
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
string hex = #"\u" + int.Parse(value).ToString("X4");
return hex;
});
This will return
string line = #"10 \u207A\u2265 0";
so the \u207A\u2265 won't be unescaped.
Note that the value is first converted to a number (int.Parse(value)) and then converted to a fixed-notation 4 digits hex number (ToString("X4"))
Or
string replaced = Regex.Replace(str, #"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
char ch = (char)int.Parse(value);
return ch.ToString();
});
This will return
string line = #"10 ⁺≥ 0";
If I understood your question correctly, you want to parse the unicode representation of the RTF to a C# string.
So, the one-liner solution looks like this
string result = Regex.Replace(line, #"\\u(\d+?)\?", new MatchEvaluator(m => ((char)Convert.ToInt32(m.Groups[1].Value)).ToString()));
But I suggest to use a cleaner code:
private static string ReplaceRtfUnicodeChar(Match match) {
int number = Convert.ToInt32(match.Groups[1].Value);
char chr = (char)number;
return chr.ToString();
}
public static void Main(string[] args)
{
string line= #"10 \u8314?\u8805? 0";
var r = new Regex(#"\\u(\d+?)\?");
string result = r.Replace(line, new MatchEvaluator(ReplaceRtfUnicodeChar));
Console.WriteLine(result); // Displays 10 ⁺≥ 0
}
You have to use MatchEvaluator:
string input = "10 \u8314?\u8805? 0";
Regex reg = new Regex(#"\\u([A-Fa-f0-9]+)\?",RegexOptions.Multiline);
string result = reg.Replace(input, delegate(Match m) {
return ConvertToWhatYouWant(m.Value);
});

Extracting Formula from String

I have to extract all variables from Formula
Fiddle for below problem
eg. (FB+AB+ESI) / 12
Output {FB,AB,ESI}
Code written so far
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char> { '+', '-', '*', '/', ')', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
int count = 0;
string character = string.Empty;
for (int i = 0; i < length; i++)
{
if (!operators.Contains(formula[i]))
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
return variables;
Output of the Method is {FB,AB,ESI} which is correct
My problem is where Varaible contains numeric field i.e
eg. (FB1+AB1)/100
Expected Output : {FB1,AB1}
But My method return {FB,AB}
If variable's names must start with
letter A..Z, a..z
and if variable's names can contain
letters A..Z, a..z
digits 0..1
underscopes _
you can use regular expressions:
String source = "(FB2+a_3B+EsI) / 12";
String pattern = #"([A-Z]|[a-z])+([A-z]|[a-z]|\d|_)*";
// output will be "{FB2,a_3B,EsI}"
String output = "{" + String.Join(",",
Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)) + "}";
In case you need a collection, say an array of variable's names, just modify the Linq:
String names[] = Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)
.ToArray();
However, what is implemented is just a kind of naive tokenizer: you have to separate "variable names" found from function names, class names, check if they are commented out etc.
Have changed your code to do what you asked, but not sure about the approach of the solution, seeing that bracket and operator precedence is not taken into consideration.
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string formula = "AB1+FB+100";
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char>{'+', '-', '*', '/', ')', '('};
List<char> numerals = new List<char>{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
int count = 0;
string character = string.Empty;
char prev_char = '\0';
for (int i = 0; i < length; i++)
{
bool is_operator = operators.Contains(formula[i]);
bool is_numeral = numerals.Contains(formula[i]);
bool is_variable = !(is_operator || is_numeral);
bool was_variable = character.Contains(prev_char);
if (is_variable || (was_variable && is_numeral) )
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
prev_char = formula[i];
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
foreach (var item in variables)
Console.WriteLine(item);
Console.WriteLine();
Console.WriteLine();
}
}
Maybe also consider something like Math-Expression-Evaluator (on nuget)
Here is how you could do it with Regular Expressions.
Regex regex = new Regex(#"([A-Z])\w+");
List<string> matchedStrings = new List<string>();
foreach (Match match in regex.Matches("(FB1+AB1)/100"))
{
matchedStrings.Add(match.Value);
}
This will create a list of strings of all the matches.
Without regex, you can split on the actual operators (not numbers), and then remove any items that begin with a number:
public static List<string> GetVariables(string formula)
{
if (string.IsNullOrWhitespace(formula)) return new List<string>();
var operators = new List<char> { '+', '-', '*', '/', '^', '%', '(', ')' };
int temp;
return formula
.Split(operators.ToArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(operand => !int.TryParse(operand[0].ToString(), out temp))
.ToList();
}
You can do it this way, just optimize the code as you want.
string ss = "(FB+AB+ESI) / 12";
string[] spl = ss.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
string final = spl[0].Replace("(", "").Replace(")", "").Trim();
string[] entries = final.Split(new char[] {'+'}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sbFinal = new StringBuilder();
sbFinal.Append("{");
foreach(string en in entries)
{
sbFinal.Append(en + ",");
}
string finalString = sbFinal.ToString().TrimEnd(',');
finalString += "}";
What you are trying to do is an interpreter.
I can't give you the whole code but what I can give you is a head start (it will require a lot of coding).
First, learn about reverse polish notation.
Second, you need to learn about stacks.
Third, you have to apply both to get what you want to interpret.

How to retrieve a substring based on a first list match

In a string I need to recover a 7 char substring based on the first match from any item in a list. If a match is not made it should return an empty string.
I have the following code:
List<string> myList = new List<string>()
{
"TNCO",
"TNCB",
"TNIT"
};
string sample = "TNSD102, WHRK301, TNIT301, YTRE234";
//doesn't give an index
bool anyfound = myList.Any(w => sample.Contains(w));
//code that needs replacing
string code = sample.Substring(sample.IndexOf("TNC"), 7);
if (code == "")
{
code = sample.Substring(sample.IndexOf("TNIT"), 7);
}
The list is never likely to be more than 35-40 items and the strings < 50 chars.
Anyone able to point me in the right direction?
string val1 = (sample.Split(',').FirstOrDefault(w => myList.Any(m => w.Contains(m))) ?? string.Empty).Trim();
This gives you an IEnumerable of all matches:
var matches = from code in sample.Split(',')
from w in myList
where code.Trim().StartsWith(w)
select code;
To get the first value use FirstOrDefault. Then use the coalesce operator ?? to return an empty string if there was no match.
string firstMatch = (matches.FirstOrDefault() ?? "").Trim();
With data sets this small, you can simply split the string and search for the first match:
// split the sample string into separate entries
var entries = sample.Split(new char[] {',', ' '},
StringSplitOptions.RemoveEmptyEntries);
// find the first entry starting with any allowed prefix
var firstMatch = entries.FirstOrDefault (
e => myList.Any (l => e.StartsWith(l)));
// FirstOrDefault returns null if there are no matches
if (firstMatch == null)
Console.WriteLine("No match!");
else
Console.WriteLine(firstMatch);
Example output (DEMO):
TNIT301
List<string> myList = new List<string> { "TNCO", "TNCB", "TNIT" };
string sample = "TNSD102, WHRK301, TNIT301, YTRE234";
string[] sampleItems = sample.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var results = myList
.Select(prefix => sampleItems
.FirstOrDefault(item => item.StartsWith(prefix)) ?? "");
Running this code here returns an Index of 2 based on what you are trying to find.
int keyIndex = myList.FindIndex(w => samples.Contains(w));
TNIT301 this is the indexed string value
you could also do the following to return the string value in index position of keyIndex variable value.
var subStrValue = samples.Split(',')[keyIndex];

Retrieve String Containing Specific substring C#

I am having an output in string format like following :
"ABCDED 0000A1.txt PQRSNT 12345"
I want to retreieve substring(s) having .txt in above string. e.g. For above it should return 0000A1.txt.
Thanks
You can either split the string at whitespace boundaries like it's already been suggested or repeatedly match the same regex like this:
var input = "ABCDED 0000A1.txt PQRSNT 12345 THE.txt FOO";
var match = Regex.Match (input, #"\b([\w\d]+\.txt)\b");
while (match.Success) {
Console.WriteLine ("TEST: {0}", match.Value);
match = match.NextMatch ();
}
Split will work if it the spaces are the seperator. if you use oter seperators you can add as needed
string input = "ABCDED 0000A1.txt PQRSNT 12345";
string filename = input.Split(' ').FirstOrDefault(f => System.IO.Path.HasExtension(f));
filname = "0000A1.txt" and this will work for any extension
You may use c#, regex and pattern, match :)
Here is the code, plug it in try. Please comment.
string test = "afdkljfljalf dkfjd.txt lkjdfjdl";
string ffile = Regex.Match(test, #"\([a-z0-9])+.txt").Groups[1].Value;
Console.WriteLine(ffile);
Reference: regexp
I did something like this:
string subString = "";
char period = '.';
char[] chArString;
int iSubStrIndex = 0;
if (myString != null)
{
chArString = new char[myString.Length];
chArString = myString.ToCharArray();
for (int i = 0; i < myString.Length; i ++)
{
if (chArString[i] == period)
iSubStrIndex = i;
}
substring = myString.Substring(iSubStrIndex);
}
Hope that helps.
First split your string in array using
char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);
Then find .txt in array...
// Find first element starting with .txt.
//
string value1 = Array.Find(array1,
element => element.Contains(".txt", StringComparison.Ordinal));
Now your value1 will have the "0000A1.txt"
Happy coding.

Get String (Text) before next upper letter

I have the following:
string test = "CustomerNumber";
or
string test2 = "CustomerNumberHello";
the result should be:
string result = "Customer";
The first word from the string is the result, the first word goes until the first uppercase letter, here 'N'
I already tried some things like this:
var result = string.Concat(s.Select(c => char.IsUpper(c) ? " " + c.ToString() : c.ToString()))
.TrimStart();
But without success, hope someone could offer me a small and clean solution (without RegEx).
The following should work:
var result = new string(
test.TakeWhile((c, index) => index == 0 || char.IsLower(c)).ToArray());
You could just go through the string to see which values (ASCII) are below 97 and remove the end. Not the prettiest or LINQiest way, but it works...
string test2 = "CustomerNumberHello";
for (int i = 1; i < test2.Length; i++)
{
if (test2[i] < 97)
{
test2 = test2.Remove(i, test2.Length - i);
break;
}
}
Console.WriteLine(test2); // Prints Customer
Try this
private static string GetFirstWord(string source)
{
return source.Substring(0, source.IndexOfAny("ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToArray(), 1));
}
Z][a-z]+ regex it will split the string to string that start with big letters her is an example
regex = "[A-Z][a-z]+";
MatchCollection mc = Regex.Matches(richTextBox1.Text, regex);
foreach (Match match in mc)
if (!match.ToString().Equals(""))
Console.writln(match.ToString() + "\n");
I have tested, this works:
string cust = "CustomerNumberHello";
string[] str = System.Text.RegularExpressions.Regex.Split(cust, #"[a-z]+");
string str2 = cust.Remove(cust.IndexOf(str[1], 1));

Categories