Algorithm to extract 2 levels down

Algorithm to extract 2 levels down - c#

Hi guys i have the following string
&|L1|L2|L3|&
I would like to extract L1 out and do some if statement to it. Meaning i would have to read for each & & and | | then extract L1?
THanks for your help!
                                          
THe string is not fixed >.< sometimes its
&|L1|A2|A3|& %3123%

If you know your data is always formatted like this, and you always want the substring contained between the first and second pipe characters, a simple regex will work (this is in C#, as that's what you've tagged the question with):
string testString = "&|L1|L2|L3|&";
string match = Regex.Match(testString, #"^.+?\|(.+?)\|").Groups[1].Value;
You can then perform whatever logic you need to on "match". Regex explanation: match the start of the string followed by a lazy match of any characters up to the first pipe, lazy match and capture any characters up to the next pipe.

You can check by looping the each char or all the chars at a time. See the below code
string Str = "&|L1|L2|L3|&";
for(int i= 0;i<Str.Length;i++)
{
if (Str[i] == 'L' && Str[i+1] == '1')
{
MessageBox.Show(Str[i].ToString() + Str[i+1].ToString() + " Found");
}
}

Related

C#: Remove Excess Text From String

Okay, so after looking around here on SO, I have found a solution that meets about 95% of my requirement, although I believe it may need to be redone at this point.
ISSUE
Say I have a value range supplied as "1000 - 1009 ABC1 ABC SOMETHING ELSE" where I just need the 1000 - 1009 part. I need to be able to remove excess characters from the string supplied, even if they truly are accepted characters, but only if they are part of secondary strings with text. (Sorry if that description seems odd, my mind isn't full power today.)
CURRENT SOLUTION
I currently have a simple method utilizing Linq to return only accepted characters, however this will return "1000 - 10091" which is not the range I am needing. I've thought about looping through the strings individual characters and comparing to previous characters as I go using IsDigit and IsLetter to my advantage, but then comes the issue of replacing the unacceptable characters or removing them. I think if I gave it a day or two I could figure it out with a clear mind, but it needs to be done by the end of the day, and I am banging my head against the keyboard.
void RemoveExcessText(ref string val) {
string allowedChars = "0123456789-+>";
val = new string(val.Where(c => allowedChars.Contains(c)).ToArray());
}
// Alternatively?
char previousChar = ' ';
for (int i = 0; i < val.Length; i++) {
if (char.IsLetter(val[i])) {
previousChar = val[i];
val.Remove(i, 1);
} else if (char.IsDigit(val[i])) {
if (char.IsLetter(previousChar)) {
val.Remove(i, 1);
}
}
}
But how do I calculate white space and leave in the +, -, and > charactrers? I am losing my mind on this one today.

Why not use a regular expression?
Regex.Match("1000 - 1009 ABC1 ABC SOMETHING ELSE", #"^(\d+)([\s\-]+)(\d+)");
Should give you what you want
I made a fiddle

You use a regular expression with a capturing group:
Regex r = new Regex("^(?<v>[-0-9 ]+?)");
This means "from the start of the input string (^) match [0 to 9 or space or hyphen] and keep going for as many occurrences of these characters as are available (+?) and store it into variable v (?)"
We get it out like this:
r.Matches(input)[0].Groups["v"].Value
Note though that if the input string doesn't match, the match collection will be 0 long and a call to [0] will crash. To this end you might want to robust it up with some extra error checking:
MatchCollection mc = r.Matches(input);
if(mc.Length > 0)
MessageBox.Show(mc[0].Groups["v"].Value;

You could match this with a regular expression. \d{1,4} means match a decimal digit at least once up to 4 times. Followed by space, hyphen, space, and 1 to 4 digits again, then anything else. Only the part inside parenthesis is output in your results.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var pattern = #"(^\d{1,4} - \d{1,4}).*";
string input = ("1000 - 1009 ABC1 ABC SOMETHING ELSE");
string replacement = "$1";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
https://dotnetfiddle.net/cZGlX4

How to remove multiple, repeating & unnecessary punctuation from string in C#?

Considering strings like this:
"This is a string....!"
"This is another...!!"
"What is this..!?!?"
...
// There are LOTS of examples of weird/angry sentence-endings like the ones above.
I want to replace the unnecessary punctuation at the end to make it look like this:
"This is a string!"
"This is another!"
"What is this?"
What I basically do is:
- split by space
- check if last char in string contains a punctuation
- start replacing with the patterns below
I have tried a very big ".Replace(string, string)" function, but it does not work - there has to be a simpler regex I guess.
Documentation:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
As well as:
Because this method returns the modified string, you can chain together successive calls to the Replace method to perform multiple replacements on the original string.
Anything is wrong here.
EDIT: ALL the proposed solutions work fine! Thank you very much!
This one was the best suited solution for my project:
Regex re = new Regex("[.?!]*(?=[.?!]$)");
string output = re.Replace(input, "");

Your solution works almost fine (demo), the only issue is when the same sequence could be matched starting at different spots. For example, ..!?!? from your last line is not part of the substitution list, so ..!? and !? get replaced by two separate matches, producing ?? in the output.
It looks like your strategy is pretty straightforward: in a chain of multiple punctuation characters the last character wins. You can use regular expressions to do the replacement:
[!?.]*([!?.])
and replace it with $1, i.e. the capturing group that has the last character:
string s;
while ((s = Console.ReadLine()) != null) {
s = Regex.Replace(s, "[!?.]*([!?.])", "$1");
Console.WriteLine(s);
}
Demo

Simply
[.?!]*(?=[.?!]$)
should do it for you. Like
Regex re = new Regex("[.?!]*(?=[.?!]$)");
Console.WriteLine(re.Replace("This is a string....!", ""));
This replaces all punctuations but the last with nothing.
[.?!]* matches any number of consecutive punctuation characters, and the (?=[.?!]$) is a positive lookahead making sure it leaves one at the end of the string.
See it here at ideone.

Or you can do it without regExps:
string TrimPuncMarks(string str)
{
HashSet<char> punctMarks = new HashSet<char>() {'.', '!', '?'};
int i = str.Length - 1;
for (; i >= 0; i--)
{
if (!punctMarks.Contains(str[i]))
break;
}
// the very last punct mark or null if there were no any punct marks in the end
char? suffix = i < str.Length - 1 ? str[str.Length - 1] : (char?)null;
return str.Substring(0, i+1) + suffix;
}
Debug.Assert("What is this?" == TrimPuncMarks("What is this..!?!?"));
Debug.Assert("What is this" == TrimPuncMarks("What is this"));
Debug.Assert("What is this." == TrimPuncMarks("What is this."));

string.Replace not working on Unicode characters "â˜…"

I'm trying to convert the following string:
â˜… Bayonet | Slaughter (Minimal Wear)
To simply:
Bayonet | Slaughter (Minimal Wear)
To do this I am trying to use the following: (name is the string containing the above)
name = name.Replace("â˜… ", "");
However, the string remains unchanged. I am retrieving the string using a JSON response which appears to suggest that strange text(a star symbol) is made using the following sequence:
\u2605
However, the string seems to be automatically converted to that strange string at the top of the post. So why does the name.replace function not work to replace those characters?

It's probably an encoding problem.
In the C# part, I'm suggesting you to try to write :
name = name.Replace("\u2605", string.Empty);
Rather than to hard-code the correspondant character.
.Net will take care of the \u2605 for you.

It is unclear to me if the ... is just normal text, I will assume it is. Use regex replace with all characters not wanted in the set [â˜] instead:
Regex.Replace("â˜… Bayonet | Slaughter (Minimal Wear)", #"[â˜]", string.Empty);
Here is the result:
… Bayonet | Slaughter (Minimal Wear)
Note with the actual full string I could write a better pattern match to replace in a more surgical fashion.
Update
Since these character(s) are only at the beginning, that makes our task easier and doesn't facilitate the traversal of the whole result; speeding up the process!
Here is a pattern ^[^\{] to use with regex replace.
It says match anything at the beginning of the string ^ with our negation set [^ ] that will search for anything but a { which is the start of JSON for one or more items +. The one or more means if there is no match, there will not be a replace; only a match at the beginning will be extracted.
string data = "â˜{\"success\":true,\"rgInventory\":{\"1509597855\":";
Console.WriteLine (data); // â˜{"success":true,"rgInventory":{"1509597855":
string pattern = #"^[^{]+";
Console.WriteLine (Regex.Replace(data, pattern, string.Empty));
// {"success":true,"rgInventory":{"1509597855":

Since none of the solutions above work for you, it might be simpler to just cut the weird string off instead of breaking your head trying to guess the correct form.
the following way you will avoid the headache of doing it all over again if the string somehow changes:
since we know the weird occurrence is always at the beginning of the string, so instead of guessing what character that is, we just check if its not 0-9 a-z or A-Z and trim it off...
class Program
{
static void Main(string[] args)
{
string name = "â˜… Bayonet | Slaughter (Minimal Wear)";
name = fix_string(name);
Console.WriteLine(name);
}
static string fix_string(string str)
{
int fix=0;
while (fix<str.Length && (!(str[fix] > (char)48 && str[fix] < (char)57) && !(str[fix] > (char)65 && str[fix] < (char)90) && !(str[fix] > (char)97 && str[fix] < (char)122)))
{
fix++;
}
return str.Substring(fix,str.Length-fix);
}
}
PS: here is the ascii table, so you can adjust what characters to trim from the beginning of your string, and adjust the function to test the whole string if you wish later
judging from your string i assume you wont need more than ascii characters

This code worked for me, perhaps you can interpret it into your code via copy paste? See if that works! Unsure completely if it's what you want, although.
String str = "â˜… Bayonet | Slaughter (Minimal Wear)";
str = str.Replace("â˜…","");
Console.WriteLine(str);
Console.ReadLine();

How to find repeatable characters

I can't understand how to solve the following problem:
I have input string "aaaabaa" and I'm trying to search for string "aa" (I'm looking for positions of characters)
Expected result is
0 1 2 5
aa aabaa
a aa abaa
aa aa baa
aaaab aa
This problem is already solved by me using another approach (non-RegEx).
But I need a RegEx I'm new to RegEx so google-search can't help me really.
Any help appreciated! Thanks!
P.S.
I've tried to use (aa)* and "\b(\w+(aa))*\w+" but those expressions are wrong

You can solve this by using a lookahead
a(?=a)
will find every "a" that is followed by another "a".
If you want to do this more generally
(\p{L})(?=\1)
This will find every character that is followed by the same character. Every found letter is stored in a capturing group (because of the brackets around), this capturing group is then reused by the positive lookahead assertion (the (?=...)) by using \1 (in \1 there is the matches character stored)
\p{L} is a unicode code point with the category "letter"
Code
String text = "aaaabaa";
Regex reg = new Regex(#"(\p{L})(?=\1)");
MatchCollection result = reg.Matches(text);
foreach (Match item in result) {
Console.WriteLine(item.Index);
}
Output
0
1
2
5

The following code should work with any regular expression without having to change the actual expression:
Regex rx = new Regex("(a)\1"); // or any other word you're looking for.
int position = 0;
string text = "aaaaabbbbccccaaa";
int textLength = text.Length;
Match m = rx.Match(text, position);
while (m != null && m.Success)
{
Console.WriteLine(m.Index);
if (m.Index <= textLength)
{
m = rx.Match(text, m.Index + 1);
}
else
{
m = null;
}
}
Console.ReadKey();
It uses the option to change the start index of a regex search for each consecutive search. The actual problem comes from the fact that the Regex engine, by default, will always continue searching after the previous match. So it will never find a possible match within another match, unless you instruct it to by using a Look ahead construction or by manually setting the start index.
Another, relatively easy, solution is to just stick the whole expression in a forward look ahead:
string expression = "(a)\1"
Regex rx2 = new Regex("(?=" + expression + ")");
MatchCollection ms = rx2.Matches(text);
var indexes = ms.Cast<Match>().Select(match => match.Index);
That way the engine will automatically advance the index by one for every match it finds.
From the docs:
When a match attempt is repeated by calling the NextMatch method, the regular expression engine gives empty matches special treatment. Usually, NextMatch begins the search for the next match exactly where the previous match left off. However, after an empty match, the NextMatch method advances by one character before trying the next match. This behavior guarantees that the regular expression engine will progress through the string. Otherwise, because an empty match does not result in any forward movement, the next match would start in exactly the same place as the previous match, and it would match the same empty string repeatedly.

Try this:
How can I find repeated characters with a regex in Java?
It is in java, but the regex and non-regex way is there. C# Regex is very similar to the Java way.

Google-like search query tokenization & string splitting

I'm looking to tokenize a search query similar to how Google does it. For instance, if I have the following search query:
the quick "brown fox" jumps over the "lazy dog"
I would like to have a string array with the following tokens:
the
quick
brown fox
jumps
over
the
lazy dog
As you can see, the tokens preserve the spaces with in double quotes.
I'm looking for some examples of how I could do this in C#, preferably not using regular expressions, however if that makes the most sense and would be the most performant, then so be it.
Also I would like to know how I could extend this to handle other special characters, for example, putting a - in front of a term to force exclusion from a search query and so on.

So far, this looks like a good candidate for RegEx's. If it gets significantly more complicated, then a more complex tokenizing scheme may be necessary, but your should avoid that route unless necessary as it is significantly more work. (on the other hand, for complex schemas, regex quickly turns into a dog and should likewise be avoided).
This regex should solve your problem:
("[^"]+"|\w+)\s*
Here is a C# example of its usage:
string data = "the quick \"brown fox\" jumps over the \"lazy dog\"";
string pattern = #"(""[^""]+""|\w+)\s*";
MatchCollection mc = Regex.Matches(data, pattern);
foreach(Match m in mc)
{
string group = m.Groups[0].Value;
}
The real benefit of this method is it can be easily extened to include your "-" requirement like so:
string data = "the quick \"brown fox\" jumps over " +
"the \"lazy dog\" -\"lazy cat\" -energetic";
string pattern = #"(-""[^""]+""|""[^""]+""|-\w+|\w+)\s*";
MatchCollection mc = Regex.Matches(data, pattern);
foreach(Match m in mc)
{
string group = m.Groups[0].Value;
}
Now I hate reading Regex's as much as the next guy, but if you split it up, this one is quite easy to read:
(
-"[^"]+"
|
"[^"]+"
|
-\w+
|
\w+
)\s*
Explanation
If possible match a minus sign, followed by a " followed by everything until the next "
Otherwise match a " followed by everything until the next "
Otherwise match a - followed by any word characters
Otherwise match as many word characters as you can
Put the result in a group
Swallow up any following space characters

I was just trying to figure out how to do this a few days ago. I ended up using Microsoft.VisualBasic.FileIO.TextFieldParser which did exactly what I wanted (just set HasFieldsEnclosedInQuotes to true). Sure it looks somewhat odd to have "Microsoft.VisualBasic" in a C# program, but it works, and as far as I can tell it is part of the .NET framework.
To get my string into a stream for the TextFieldParser, I used "new MemoryStream(new ASCIIEncoding().GetBytes(stringvar))". Not sure if this is the best way to do it.
Edit: I don't think this would handle your "-" requirement, so maybe the RegEx solution is better

Go char by char to the string like this: (sort of pseudo code)
array words = {} // empty array
string word = "" // empty word
bool in_quotes = false
for char c in search string:
if in_quotes:
if c is '"':
append word to words
word = "" // empty word
in_quotes = false
else:
append c to word
else if c is '"':
in_quotes = true
else if c is ' ': // space
if not empty word:
append word to words
word = "" // empty word
else:
append c to word
// Rest
if not empty word:
append word to words

I was looking for a Java solution to this problem and came up with a solution using #Michael La Voie's. Thought I would share it here despite the question being asked for in C#. Hope that's okay.
public static final List<String> convertQueryToWords(String q) {
List<String> words = new ArrayList<>();
Pattern pattern = Pattern.compile("(\"[^\"]+\"|\\w+)\\s*");
Matcher matcher = pattern.matcher(q);
while (matcher.find()) {
MatchResult result = matcher.toMatchResult();
if (result != null && result.group() != null) {
if (result.group().contains("\"")) {
words.add(result.group().trim().replaceAll("\"", "").trim());
} else {
words.add(result.group().trim());
}
}
}
return words;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Algorithm to extract 2 levels down - c#

Hi guys i have the following string &|L1|L2|L3|& I would like to extract L1 out and do some if statement to it. Meaning i would have to read for each & & and | | then extract L1? THanks for your help! THe string is not fixed >.< sometimes its &|L1|A2|A3|& %3123%

You can check by looping the each char or all the chars at a time. See the below code string Str = "&|L1|L2|L3|&"; for(int i= 0;i<Str.Length;i++) { if (Str[i] == 'L' && Str[i+1] == '1') { MessageBox.Show(Str[i].ToString() + Str[i+1].ToString() + " Found"); } }

Related

C#: Remove Excess Text From String

How to remove multiple, repeating & unnecessary punctuation from string in C#?

string.Replace not working on Unicode characters "â˜…"

How to find repeatable characters

Google-like search query tokenization & string splitting

Categories

Resources