I need to compare a value in a string to what user typed in a richtextbox.
For example: if a richtextbox holds string rtbText = "aaaka" and I compare this to another variable string comparable = "ka"(I want it to compare backwards). I want the last 2 letters from rtbText (comparable has only 2 letters) to be replaced with something that was predetermined(doesn't really matter what).
So rtbText should look like this:
rtbText = "aaa(something)"
This doesn't really have to be compared it can just count letters in comparable and based on that it can remove 2 letters from rtbText and replace them with something else.
UPDATE:
Here is what I have:
int coLen = comparable.Length;
comparable = null;
TextPointer caretBack = rtb.CaretPosition.GetPositionAtOffset(coLen, LogicalDirection.Backward);
TextRange rtbText = new TextRange(rtb.CaretPosition, caretBack);
string text = rtbText.Text;
rtbText returns an empty string or I get an error for everything longer than 3 characters. What am I doing wrong?
Let me elaborate it a little bit further. I have a listbox that holds replacements for values that user types in rtb. The values(replacements) are coming from there, meaning that I don't really need to go through the whole text to check values. I just need to check the values right before caret. I am comparing these values to what I have stored in another variable (comparable).
Please let me know if you don't understand something.
I did my best to explain what needs to be done.
Thank you
You could use Regex.Replace.
// this replaces all occurances of "ka" with "Replacement"
Regex replace = new Regex("ka");
string result = replace.Replace("aaaka","Replacemenet");
gumenimeda, I had similar problems few weeks ago. I found my self doing the following (I asume you will have more than one occurance in the RichTextBox that you will need to change), note that I did it for Windows Forms where I have access directly to the Rtf text of the control, not quite sure if it will work well in your scenario:
I find all the occurancies of the string (using IndexOf for example) and store them in a List for example.
Sort the list in descending order (max index goes first, the one before him second, etc)
Start replacing the occurancies directly in the RichTextBox, by removing the characters I don't need and appending the characters I need.
The sorting in step 2 is necessary as we always want to start from the last occurance going up to the first. Starting from the first occurance or any other and going down will have an unpleasant surprise - if the length of the chunk you want to remove and the length of the chunk you want to append are different in length, the string will be modified and all other occurancies will be invalid (for example if the second occurance was in at position 12 and your new string is 2 characters longer than the original, it will become 14th). This is not an issue if we go from the last to the first occurance as the change in string will not affect the next occurance in the list).
Ofcourse I can not be sure that this is the fastest way that can be used to achieve the desired result. It's just what I came up with and what worked for me.
Good luck!
Related
I have a string as you can see below. What the easiest and best way to take the middle value only.
123456789-11-abcd
So, I just want to take the middle value which is in between two (-).I know we can split on - and then we can store the string array and then find the right string but that will lengthy. Is there any easies way.
TIA
input_value = '123456789-11-abcd' # Consider this as a string
Split input_value by '-' and it will generate an list say Split_list
print second element from the Split_list
output_value = Split_list[1]
print(output_value) # Required Output
I'm new to C#. I'm parsing for a lot number in a 2D barcode. The actual lot number 'A2351' is hidden in this barcode string "+M727PP011/$$3201001A2351S". I would like to break this barcode up in separate string blocks but the delimiters are not consistent.
The letter prefix in front of the 4 digit lot number can be a 'A', 'P', or a 'D' There is a single letter following the lot number that can be ignored.
string Delimiter = "/$$3";
//barcode format:M###PP###/$$3 ddmmyy lotnumprefix 'A' followed by lotNum
string lotNum= "+M727PP011/$$3201001A2351S";
string[] split = lotNum.Split(new[] {Delimiter}, StringSplitOptions.None);
How do I extract the lot number after the date?
Based on your initial example and then the subsequent edit in which you showed how you are solving this, it sounds like the lot number is always in the same place. It would be cleaner (and more in line with standard C# code) to use a single call to string.Substring(int,int) rather than the two lines you are using which also require pulling in the VB library. You just need to call Substring and give it the starting index and the length.
So this code:
string lotNum = Strings.Right(barcode, 6);
lotNum = lotNum.Remove((lotNum.Length - 1), 1);
Can be done with this single substring call:
string lotNum = barcode.Substring(barcode.Length - 6, 5);
Edit
Just further clarification on why it might be better to use the call to Substring. In C# string objects are immutable. That means that when you make the call to Strings.Right you are getting back a new string object. When you then call lotNum.Remove you do not "remove" a character from the existing string, a new string is allocated with the character(s) removed and is returned to you. So with your code there are two new string allocations when trying to extract the lot number. When you make the call to Substring you will get back a new string, but instead of getting a new string that you immediately then modify and get a second new string, you will only need to allocate one new string to extract the lot number. In the example you have given there probably would not be any noticeable performance/memory issue, but it is something that could potentially lead to trouble if this code was in a tight loop or something like that.
If you're just trying to get the lot number, it's really dependent on the format of the input string (is it a consistent length, are there any reliable prefixes/suffixes relative to the data you're trying to parse that you can reference from, etc). It looks like your data is definable by its static position in the string, so it looks like you could use the substring
(with an index of 20?) method to accomplish what you want.
Just looking to see what the best way to approach the following situation would be.
I am trying to make a small job that reads in a txt file which has a thousand or so lines;
Each line is about 40 characters long (mostly numbers, some letter identifiers).
I have used
DataTable txtCache = new DataTable();
txtCache.Columns.Add(new DataColumn("Column1"));
string[] lines = System.IO.File.ReadAllLines(FILEcheck.Properties.Settings.Default.filePath);
foreach (string line in lines)
{
txtCache.Rows.Add(line);
}
However, what I really want to do is a bit confusing and hard to explain so i'll do my best. An example of line is below:
5498494000584454684840}eD44448774V6468465 Z
In the beginning of that long string is a "84", and then a "58" a little bit later. I need to do a comparison on these two numbers. They could be anything, but only a few combinations are acceptable in the file. They will always be in the same spot and same amount of characters (so it will always be 2 numbers and always in the 4-5 location). So I want to have 3 columns. I want the full string in 1 column, and then the 2 individual smaller numbers in columns of themselves. I can then compare them later on, and if there is an issue, I can return the full string which caused the issue.
Is this possible? I am just not sure how to parse out a substring based on character location and then loading it into a datatable.
Any advice would be appreciated. Thank you,
You could create the columns for each of items you are looking to store (whole string, first number, second number), and then add a row for each of the lines in the input file. You could just use the substring method to parse out the two digit numbers and store them. To do your analysis, you could parse the numbers out from the strings, or whatever else you need to do.
lines[0].Substring(3,2) will give you "84" in your above example. If you want the int, you could use Int32.Parse(lines[0].Substring(3,2))
Substring reference: http://msdn.microsoft.com/en-us/library/aka44szs%28v=vs.110%29.aspx
Checks if something contains any instance of any element in myString. Something may be "Sideboard: 1 Forest", "SB: 1 Mountain", "SB 1 Plains", etc. If something does contain any of the elements of the array, those elements will always be followed by a white space, a number, a white space, and a string: " 1 Swamp".
string[] myString = {"Side", "side", "Board", "board", "Sideboard", "sideboard", "SB", "sb", "SB:", "sb:"};
if(myString.Any(s => something.Contains(s)))
{
// newSomething = something but with any instance of any element in myString removed
// from the start of something up to the first whitespace.
}
I need help with the removing part of the comment in the above if statement block.
EDIT
Those are some blunt responses, but I understand!
I did go through the introduction and tried to searched for relevant stuff but couldn't find this situation. This is a little program I made for personal use, not for an assignment. I do not know anything about regular expressions.
As for clarification, at the point in the program where "something" is found to contain any of the elements in the "myString" array, I then want to remove those elements only from the start of the string in "something".
Ex: something = "Sideboard: 1 Inside Out"
I want to remove "myString" elements only from the beginning of the string and before the number because the strings after the number may also contain elements of "myString".
Also, upon submitting this question, I instantly figured out a way to do the removing: I just used TrimStart() with an array containing the individual characters of the elements in "myString". This did what I wanted but I kept this question up to look for a more elegant solution. So, for all intents and purposes, this question is already answered.
Thanks for the help!
string newSomething = something.Substring(0, something.IndexOf(' '));
The above code will return a string ending at the first instance of a space in the 'something' string. If there are no spaces in the 'something' string, there will be an exception.
Not sure if this is what you were looking for, good luck.
If a have a string with words and no spaces, how should I parse those words given that I have a dictionary/list that contains those words?
For example, if my string is "thisisastringwithwords" how could I use a dictionary to create an output "this is a string with words"?
I hear that using the data structure Tries could help but maybe if someone could help with the pseudo code? For example, I was thinking that maybe you could index the dictionary into a trie structure, then follow each char down the trie; problem is, I'm unfamiliar with how to do this in (pseudo)code.
I'm assuming that you want an efficient solution, not the obvious one where you repeatedly check if your text starts with a dictionary word.
If the dictionary is small enough, I think you could try and modify the standard KMP algorithm. Basically, build a finite-state machine on your dictionary which consumes the text character by character and yields the constructed words.
EDIT: It appeared that I was reinventing tries.
I already did something similar. You cannot use a simple dictionary. The result will be messy. It depends if you only have to do this once or as whole program.
My solution was to:
Connect to a database with working
words from a dictionary list (for
example online dictionary)
Filter long and short words in dictionary and check if you want to trim stuff (for example don't use words with only one character like 'I')
Start with short words and compare your bigString with the database dictionary.
Now you need to create a "table of possibility". Because a lot of words can fit into 100% but are wrong. As longer the word as more sure you are, that this word is the right one.
It is cpu intensive but it can work precise in the result.
So lets say, you are using a small dictionary of 10,000 words and 3,000 of them are with a length of 8 characters, you need to compare your bigString at start with all 3,000 words and only if result was found, it is allowed to proceed to the next word. If you have 200 characters in your bigString you need about (2000chars / 8 average chars) = 250 full loops minimum with comparation.
For me, I also did a small verification of misspelled words into the comparation.
example of procedure (don't copy paste)
Dim bigString As String = "helloworld.thisisastackoverflowtest!"
Dim dictionary As New List(Of String) 'contains the original words. lets make it case insentitive
dictionary.Add("Hello")
dictionary.Add("World")
dictionary.Add("this")
dictionary.Add("is")
dictionary.Add("a")
dictionary.Add("stack")
dictionary.Add("over")
dictionary.Add("flow")
dictionary.Add("stackoverflow")
dictionary.Add("test")
dictionary.Add("!")
For Each word As String In dictionary
If word.Length < 1 Then dictionary.Remove(word) 'remove short words (will not work with for each in real)
word = word.ToLower 'make it case insentitive
Next
Dim ResultComparer As New Dictionary(Of String, Double) 'String is the dictionary word. Double is a value as percent for a own function to weight result
Dim i As Integer = 0 'start at the beginning
Dim Found As Boolean = False
Do
For Each word In dictionary
If bigString.IndexOf(word, i) > 0 Then
ResultComparer.Add(word, MyWeightOfWord) 'add the word if found, long words are better and will increase the weight value
Found = True
End If
Next
If Found = True Then
i += ResultComparer(BestWordWithBestWeight).Length
Else
i += 1
End If
Loop
I told you that it seems like an impossible task. But you can have a look at this related SO question - it may help you.
If you are sure you have all the words of the phrase in the dictionary, you can use that algo:
String phrase = "thisisastringwithwords";
String fullPhrase = "";
Set<String> myDictionary;
do {
foreach(item in myDictionary){
if(phrase.startsWith(item){
fullPhrase += item + " ";
phrase.remove(item);
break;
}
}
} while(phrase.length != 0);
There are so many complications, like, some items starting equally, so the code will be changed to use some tree search, BST or so.
This is the exact problem one has when trying to programmatically parse languages like Chinese where there are no spaces between words. One method that works with those languages is to start by splitting text on punctuation. This gives you phrases. Next you iterate over the phrases and try to break them into words starting with the length of the longest word in your dictionary. Let's say that length is 13 characters. Take the first 13 characters from the phrase and see if it is in your dictionary. If so, take it as a correct word for now, move forward in the phrase and repeat. Otherwise, shorten your substring to 12 characters, then 11 characters, etc.
This works extremely well, but not perfectly because we've accidentally put in a bias towards words that come first. One way to remove this bias and double check your result is to repeat the process starting at the end of the phrase. If you get the same word breaks you can probably call it good. If not, you have an overlapping word segment. For example, when you parse your sample phrase starting at the end you might get (backwards for emphasis)
words with string a Isis th
At first, the word Isis (Egyptian Goddess) appears to be the correct word. When you find that "th" is not in your dictionary, however, you know there is a word segmentation problem nearby. Resolve this by going with the forward segmentation result "this is" for the non-aligned sequence "thisis" since both words are in the dictionary.
A less common variant of this problem is when adjacent words share a sequence which could go either way. If you had a sequence like "archand" (to make something up), should it be "arc hand" or "arch and"? The way to determine is to apply a grammar checker to the results. This should be done to the whole text anyway.
Ok, I will make a hand wavy attempt at this. The perfect(ish) data structure for your problem is (as you've said a trie) made up of the words in the dictionary. A trie is best visualised as a DFA, a nice state machine where you go from one state to the next on every new character. This is really easy to do in code, a Java(ish) style class for this would be :
Class State
{
String matchedWord;
Map<char,State> mapChildren;
}
From hereon, building the trie is easy. Its like having a rooted tree structure with each node having multiple children. Each child is visited on one character transition. The use of a HashMap kind of structure trims down time to look up character to next State mappings. Alternately if all you have are 26 characters for the alphabet, a fixed size array of 26 would do the trick as well.
Now, assuming all of that made sense, you have a trie, your problem still isn't fully solved. This is where you start doing things like regular expressions engines do, walk down the trie, keep track of states which match to a whole word in the dictionary (thats what I had the matchedWord for in the State structure), use some backtracking logic to jump to a previous match state if the current trail hits a dead end. I know its general but given the trie structure, the rest is fairly straightforward.
If you have dictionary of words and need a quick implmentation this can be solved efficiently with dynamic programming in O(n^2) time, assuming the dictionary lookups are O(1). Below is some C# code, the substring extraction could and dictionary lookup could be improved.
public static String[] StringToWords(String str, HashSet<string> words)
{
//Index of char - length of last valid word
int[] bps = new int[str.Length + 1];
for (int i = 0; i < bps.Length; i++)
bps[i] = -1;
for (int i = 0; i < str.Length; i++)
{
for (int j = i + 1; j <= str.Length ; j++)
{
if (bps[j] == -1)
{
//Destination cell doesn't have valid backpointer yet
//Try with the current substring
String s = str.Substring(i, j - i);
if (words.Contains(s))
bps[j] = i;
}
}
}
//Backtrack to recovery sequence and then reverse
List<String> seg = new List<string>();
for (int bp = str.Length; bps[bp] != -1 ;bp = bps[bp])
seg.Add(str.Substring(bps[bp], bp - bps[bp]));
seg.Reverse();
return seg.ToArray();
}
Building a hastset with the word list from /usr/share/dict/words and testing with
foreach (var s in StringSplitter.StringToWords("thisisastringwithwords", dict))
Console.WriteLine(s);
I get the output "t hi sis a string with words". Because as others have pointed out this algorithm will return a valid segmentation (if one exists), however this may not be the segmentation you expect. The presence of short words is reducing the segmentation quality, you might be able to add heuristic to favour longer words if two valid sub-segmentation enter an element.
There are more sophisticated methods that finite state machines and language models that can generate multiple segmentations and apply probabilistic ranking.