Writing Dynamic IF Functions by user entry - c#

I'm still pretty new with coding(C#) and am writing a file reader based on user criteria(What string they want to search).
What my program does:
The user can enter search a sting / multiple strings (with AND OR functionality)
The program interprets the user entry and re-writes the string into code. e.g.
string fileLine = File.ReadAllText(line);
USER:
(hi AND no) OR yes
PROGRAM:
if((fileLine.Contains(hi) && fileLine.Contains(no)) || fileLine.Contains(yes))
What I'm trying to do:
When matching the string to the File string i use an IF Function:
if(fileLine.Contains(hi))
{
//do A LOT of stuff here.
}
My first idea was to make a string out of the the entered string and replace the "condition" in the IF Function.
Am i going about this in the wrong way? What would the best way of achieving this be?

Parameterizing the input is a good idea. It looks like you're really trying to do two things: first, you want to determine what the user's input was, and then you want to act on that input.
So, first, read the user-inputted line. If your typical expected input is something like (bob AND casey) OR josh then it should be pretty simple to implement a regex-based grammar on the backend, though as juharr says, it's more complicated for you that way. But, assuming user input is AND/OR and grouped by parenthesis, you'll likely want to break it down like this:
For each unit of filtering - each parenthetical - you want to hold 2 pieces of information: the items being filtered on (bob, casey) and the operator (AND)
These units are order sensitive; filtering on (bob AND casey) OR josh is different from bob AND (casey OR josh). So you also want a higher level representation of the order to search in. For (bob AND casey) OR josh, you'll be determining validity of the search based on {result of bob AND casey} OR {result of josh}.
These are probably objects. :)
When you have your user input in a standardized form, you will want to sanity check it, and inform the user if you found something you can't parse (an unclosed parentheses, etc).
After informing the user, then and only then should you perform the actual file searching. I would suggest treating each unit of search (item 2 above) as its own "search" and using a switch statement for the search operators, e.g.
switch (operator) {
case operator.AND:
inString = fileLine.Contains(itemOne) && fileLine.Contains(itemTwo)
case operator.OR:
inString = fileLine.Contains(itemOne) || fileLine.Contains(itemTwo)
}
Additionally, you will want to handle cases where you are comparing bool AND string.Contains and bool OR string.Contains.
If you have something like bob AND josh OR casey you could do a one-level linear comparison where you simply chopped them up and passed them through the comparison one at a time (bob and josh returns a bool, then pass that bool to a comparison for a bool OR string.Contains operation)
This structure also means you're less likely to get spaghetti code if your scope changes (meaning, you won't have an ever-increasing series of if statements), and you can handle unpredictable input and notify the user if something is wrong.

Related

Filter text in a variable

I have a variable (serial_and_username_and_subType) that contains this type of text:
CT-AF-23-GQG %username1% *subscriptionType*
DHR-345349-E %username2% *subscriptionType*
C3T-AF434-234-GQG %username3% *subscriptionType*
34-7-HHDHFD-DHR-345349-E %username4% *subscriptionType*
example: ST-NN1-CQ-QQQ-G12 %RandomDUDE12% *Lifetime*
after that, i have an IF instruction that checks if the user inputs something that is present in serial_and_username_and_subType.
if (userInput.Contains
(serial_and_username_and_subType))......
then, what i would like to do (but i am having troubles) is that when someone enters a serial, the program prints the corrispective username and subscription:
for example:
Please enter your Serial:
> ST-NN1-CQ-QQQ-G12
Welcome, RandomDUDE12!
You currently have a Lifetime subscription!
does anyone know a method or a way to obtain what i need?
You are already using Contains(). The other things you could use are
Substring()
Split()
IndexOf()
Split() is probably the easiest one as long as you can guarantee that neither % nor * are part of the serial, username or license:
var s = "ST-NN1-CQ-QQQ-G12 %RandomDUDE12% *Lifetime*";
var splitPercent = s.Split('%');
Console.WriteLine(splitPercent[1]);
var splitStar = s.Split('*');
Console.WriteLine(splitStar[1]);
This approach will work fine as long as you have few licenses only (maybe a few thousand are ok, because PCs are fast). If you have many licenses (like millions), you probably want to separate all that information so that it is not in a string, but a data structure. You would then use a dictionary and access the information directly, instead of iterating through all of them.

Speech to text in c#

I have a c# program that lets me use my microphone and when I speak, it does commands and will talk back. For example, when I say "What's the weather tomorrow?" It will reply with tomorrows weather.
The only problem is, I have to type out every phrase I want to say and have it pre-recorded. So if I want to ask for the weather, I HAVE to say it like i coded it, no variations. I am wondering if there is code to change this?
I want to be able to say "Whats the weather for tomorrow", "whats tomorrows weather" or "can you tell me tomorrows weather" and it tell me the next days weather, but i don't want to have to type in each phrase into code. I seen something out there about e.Result.Alternates, is that what I need to use?
This cannot be done without involving linguistic resources. Let me explain what I mean by this.
As you may have noticed, your C# program only recognizes pre-recorded phrases and only if you say the exact same words. (As an aside node, this is quite an achievement in itself, because you can hardly say a sentence twice without altering it a bit. Small changes, that is, e.g. in sound frequency or lengths, might not be relevant to your colleagues, but they matter to your program).
Therefore, you need to incorporate a kind of linguistic resource in your program. In other words, make it "understand" facts about human language. Two suggestions with increasing complexity below. All apporaches assume that your tool is capable of tokenizing an audio input stream in a sensible way, i.e. extract words from it.
Pattern matching
To avoid hard-coding the sentences like
Tell me about the weather.
What's the weather tomorrow?
Weather report!
you can instead define a pattern that matches any of those sentences:
if a sentence contains "weather", then output a weather report
This can be further refined in manifold ways, e.g. :
if a sentence contains "weather" and "tomorrow", output tomorrow's forecast.
if a sentence contains "weather" and "Bristol", output a forecast for Bristol
This kind of knowledge must be put into your program explicitly, for instance in the form of a dictionary or lookup table.
Measuring Similarity
If you plan to spend more time on this, you could implement a means for finding the similarity between input sentences. There are many approaches to this as well, but a prominent one is a bag of words, represented as a vector.
In this model, each sentence is represented as a vector, each word in it present as a dimension of the vector. For example, the sentence "I hate green apples" could be represented as
I = 1
hate = 1
green = 1
apples = 1
red = 0
you = 0
Note that the words that do not occur in this particular sentence, but in other phrases the program is likely to encounter, also represent dimensions (for example the red = 0).
The big advantage of this approach is that the similarity of vectors can be easily computed, no matter how multi-dimensional they are. There are several techniques that estimate similarity, one of them is cosine similarity (see for example http://en.wikipedia.org/wiki/Cosine_similarity).
On a more general note, there are many other considerations to be made of course.
For example, some words might be utterly irrelevant to the message you want to convey, as in the following sentence:
I want you to output a weather report.
Here, at least "I", "you" "to" and "a" could be done away with without damaging the basic semantics of the sentence. Such words are called stop words and are discarded early in many tools that perform speech-to-text analysis.
Also note that we started out assuming that your program reliably identifies sound input. In reality, no tool is capable of infallibly identifying speech.
Humans tend to forget that sound actually exists without cues as to where word or sentence boundaries are. This makes so-called disambiguation of input a gargantuan task that is easily underestimated - and ambiguity one of the hardest problems of computational linguistics in general.
For that, the code won't be able to judge that! You need to split the command in text array! Such as
Tomorrow
Weather
What
This way, you will compare it with the text that is present in your computer! Lets say, with the command (what) with type (weather) and with the time (tomorrow).
It is better to read and understand each word, then guess it will work as Google! Google uses the same, they break down the string and compare it.

Better method of handling/reading these files (HCFA medical claim form)

I'm looking for some suggestions on better approaches to handling a scenario with reading a file in C#; the specific scenario is something that most people wouldn't be familiar with unless you are involved in health care, so I'm going to give a quick explanation first.
I work for a health plan, and we receive claims from doctors in several ways (EDI, paper, etc.). The paper form for standard medical claims is the "HCFA" or "CMS 1500" form. Some of our contracted doctors use software that allows their claims to be generated and saved in a HCFA "layout", but in a text file (so, you could think of it like being the paper form, but without the background/boxes/etc). I've attached an image of a dummy claim file that shows what this would look like.
The claim information is currently extracted from the text files and converted to XML. The whole process works ok, but I'd like to make it better and easier to maintain. There is one major challenge that applies to the scenario: each doctor's office may submit these text files to us in slightly different layouts. Meaning, Doctor A might have the patient's name on line 10, starting at character 3, while Doctor B might send a file where the name starts on line 11 at character 4, and so on. Yes, what we should be doing is enforcing a standard layout that must be adhered to by any doctors that wish to submit in this manner. However, management said that we (the developers) had to handle the different possibilities ourselves and that we may not ask them to do anything special, as they want to maintain good relationships.
Currently, there is a "mapping table" set up with one row for each different doctor's office. The table has columns for each field (e.g. patient name, Member ID number, date of birth etc). Each of these gets a value based on the first file that we received from the doctor (we manually set up the map). So, the column PATIENT_NAME might be defined in the mapping table as "10,3,25" meaning that the name starts on line 10, at character 3, and can be up to 25 characters long. This has been a painful process, both in terms of (a) creating the map for each doctor - it is tedious, and (b) maintainability, as they sometimes suddenly change their layout and then we have to remap the whole thing for that doctor.
The file is read in, line by line, and each line added to a
List<string>
Once this is done, we do the following, where we get the map data and read through the list of file lines and get the field values (recall that each mapped field is a value like "10,3,25" (without the quotes)):
ClaimMap M = ClaimMap.GetMapForDoctor(17);
List<HCFA_Claim> ClaimSet = new List<HCFA_Claim>();
foreach (List<string> cl in Claims) //Claims is List<List<string>>, where we have a List<string> for each claim in the text file (it can have more than one, and the file is split up into separate claims earlier in the process)
{
HCFA_Claim c = new HCFA_Claim();
c.Patient = new Patient();
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
//...and so on...
ClaimSet.Add(c);
}
Sorry this is so long...but I felt that some background/explanation was necessary. Are there any better/more creative ways of doing something like this?
Given the lack of standardization, I think your current solution although not ideal may be the best you can do. Given this situation, I would at least isolate concerns e.g. file read, file parsing, file conversion to standard xml, mapping table access etc. to simple components employing obvious patterns e.g. DI, strategies, factories, repositories etc. where needed to decouple the system from the underlying dependency on the mapping table and current parsing algorithms.
You need to work on the DRY (Don't Repeat Yourself) principle by separating concerns.
For example, the code you posted appears to have an explicit knowledge of:
how to parse the claim map, and
how to use the claim map to parse a list of claims.
So there are at least two responsibilities directly relegated to this one method. I'd recommend changing your ClaimMap class to be more representative of what it's actually supposed to represent:
public class ClaimMap
{
public ClaimMapField Name{get;set;}
...
}
public class ClaimMapField
{
public int StartingLine{get;set;}
// I would have the parser subtract one when creating this, to make it 0-based.
public int StartingCharacter{get;set;}
public int MaxLength{get;set;}
}
Note that the ClaimMapField represents in code what you spent considerable time explaining in English. This reduces the need for lengthy documentation. Now all the M.Name.Split calls can actually be consolidated into a single method that knows how to create ClaimMapFields out of the original text file. If you ever need to change the way your ClaimMaps are represented in the text file, you only have to change one point in code.
Now your code could look more like this:
c.Patient.FullName = cl[map.Name.StartingLine].Substring(map.Name.StartingCharacter, map.Name.MaxLength).Trim();
c.Patient.Address = cl[map.Address.StartingLine].Substring(map.Address.StartingCharacter, map.Address.MaxLength).Trim();
...
But wait, there's more! Any time you see repetition in your code, that's a code smell. Why not extract out a method here:
public string ParseMapField(ClaimMapField field, List<string> claim)
{
return claim[field.StartingLine].Substring(field.StartingCharacter, field.MaxLength).Trim();
}
Now your code can look more like this:
HCFA_Claim c = new HCFA_Claim
{
Patient = new Patient
{
FullName = ParseMapField(map.Name, cl),
Address = ParseMapField(map.Address, cl),
}
};
By breaking the code up into smaller logical pieces, you can see how each piece becomes very easy to understand and validate visually. You greatly reduce the risk of copy/paste errors, and when there is a bug or a new requirement, you typically only have to change one place in code instead of every line.
If you are only getting unstructured text, you have to parse it. If the text content changes you have to fix your parser. There's no way around this. You could probably find a 3rd party application to do some kind of visual parsing where you highlight the string of text you want and it does all the substring'ing for you but still unstructured text == parsing == fragile. A visual parser would at least make it easier to see mistakes/changed layouts and fix them.
As for parsing it yourself, I'm not sure about the line-by-line approach. What if something you're looking for spans multiple lines? You could bring the whole thing in a single string and use IndexOf to substring that with different indices for each piece of data you're looking for.
You could always use RegEx instead of Substring if you know how to do that.
While the basic approach your taking seems appropriate for your situation, there are definitely ways you could clean up the code to make it easier to read and maintain. By separating out the functionality that you're doing all within your main loop, you could change this:
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
to something like this:
var parser = new FormParser(cl, M);
c.PatientFullName = FormParser.GetName();
c.PatientAddress = FormParser.GetAddress();
// etc
So, in your new class, FormParser, you pass the List that represents your form and the claim map for the provider into the constructor. You then have a getter for each property on the form. Inside that getter, you perform your parsing/substring logic like you're doing now. Like I said, you're not really changing the method by which your doing it, but it certainly would be easier to read and maintain and might reduce your overall stress level.

Creating a character variation algorithm for a synonym table

I have a need to create a variation/synonym table for a client who needs to make sure if someone enters an incorrect variable, we can return the correct part.
Example, if we have a part ID of GRX7-00C. When the client enters this into a part table, they would like to automatically create a variation table that will store variations that this product could be. Like GBX7-OOC (letter O instead of number 0). Or if they have the number 1, to be able to use L or I.
So if we have part GRL8-OOI we could have the following associated to it in the variation table:
GRI8-OOI
GRL8-0OI
GRL8-O0I
GRL8-OOI
etc....
I currently have a manual entry for this, but there could be a ton of variations of these parts. So, would anyone have a good idea at how I can create a automatic process for this?
How can I do this in C# and/or SQL?
I'm not a C# programmer, but for other .NET languages it would make more sense to me to create a list of CHARACTERS that are similar, and group those together, and use RegEx to evaluate if it matches.
i.e. for your example:
Original:
GRL8-001
Regex-ploded:
GR(l|L|1)(8|b|B)-(0|o|O)(0|o|O)(1|l|L)
You could accomplish this by having a table of interchangeable characters and running a replace function to sub the RegEx for the character automatically.
Lookex function psuedocode (works like soundex but for look alike instead of sound alike)
string input
for each char c
if c in "O0Q" c = 'O'
else if c in "IL1" c = 'I'
etc.
compute a single Lookex code and store that with each product id. If user's entry doesn't match a product id, compute the Lookex code on their entry and search for all products having that code (there could be more than 1). This would consume minimal space, and be quite fast with a single index, and inexpensive to compute as well.
Given your input above, what I would do is not store a table of synonyms, but instead, have a set of rules checked against a master dictionary. So for example, if the user types in a value that is not found in the dictionary, change O to 0, and check for that existing in the dictionary. Change GR to GB and check for that. Etc. All the variations they want to allow described above can be explained as rules that you can apply one at a time or in combination and check if the resulting entry exists. That way you do not have to have a massive dictionary of synonyms to maintain and update.
I wouldn't go the synonym route at all.
I would cleanse all values in the database using a standard rule set.
For every value that exists, replace all '0's with 'O's, strip out dashes etc, so that for each real value you have only one modified value and store that in a seperate field\table.
Then I would cleanse the input the same way, and do a two-part match. Check the actual input string against the actual database values(this will get you exact matches), and secondly check the cleansed input against the cleansed values. Then order the output against the actual database values using a distance calc such as Levenshtein Distance to get the most likely match.
Now for the input:
GRL8-OO1
With parts:
GRL8-00I & GRL8-OOI
These would all normalize to the same value GRL8OOI, though the distance match would be closer for GRL8-OOI, so that would be your closest bet.
Granted this dramatically reduces the "uniqueness" of your part numbers, but the combo of the two-part match and the Levenshtein should get you what you are looking for.
There are several T-SQL implementations of Levenshtein available

C# better way to do this?

Hi I have this code below and am looking for a prettier/faster way to do this.
Thanks!
string value = "HelloGoodByeSeeYouLater";
string[] y = new string[]{"Hello", "You"};
foreach(string x in y)
{
value = value.Replace(x, "");
}
You could do:
y.ToList().ForEach(x => value = value.Replace(x, ""));
Although I think your variant is more readable.
Forgive me, but someone's gotta say it,
value = Regex.Replace( value, string.Join("|", y.Select(Regex.Escape)), "" );
Possibly faster, since it creates fewer strings.
EDIT: Credit to Gabe and lasseespeholt for Escape and Select.
While not any prettier, there are other ways to express the same thing.
In LINQ:
value = y.Aggregate(value, (acc, x) => acc.Replace(x, ""));
With String methods:
value = String.Join("", value.Split(y, StringSplitOptions.None));
I don't think anything is going to be faster in managed code than a simple Replace in a foreach though.
It depends on the size of the string you are searching. The foreach example is perfectly fine for small operations but creates a new instance of the string each time it operates because the string is immutable. It also requires searching the whole string over and over again in a linear fashion.
The basic solutions have all been proposed. The Linq examples provided are good if you are comfortable with that syntax; I also liked the suggestion of an extension method, although that is probably the slowest of the proposed solutions. I would avoid a Regex unless you have an extremely specific need.
So let's explore more elaborate solutions and assume you needed to handle a string that was thousands of characters in length and had many possible words to be replaced. If this doesn't apply to the OP's need, maybe it will help someone else.
Method #1 is geared towards large strings with few possible matches.
Method #2 is geared towards short strings with numerous matches.
Method #1
I have handled large-scale parsing in c# using char arrays and pointer math with intelligent seek operations that are optimized for the length and potential frequency of the term being searched for. It follows the methodology of:
Extremely cheap Peeks one character at a time
Only investigate potential matches
Modify output when match is found
For example, you might read through the whole source array and only add words to the output when they are NOT found. This would remove the need to keep redimensioning strings.
A simple example of this technique is looking for a closing HTML tag in a DOM parser. For example, I may read an opening STYLE tag and want to skip through (or buffer) thousands of characters until I find a closing STYLE tag.
This approach provides incredibly high performance, but it's also incredibly complicated if you don't need it (plus you need to be well-versed in memory manipulation/management or you will create all sorts of bugs and instability).
I should note that the .Net string libraries are already incredibly efficient but you can optimize this approach for your own specific needs and achieve better performance (and I have validated this firsthand).
Method #2
Another alternative involves storing search terms in a Dictionary containing Lists of strings. Basically, you decide how long your search prefix needs to be, and read characters from the source string into a buffer until you meet that length. Then, you search your dictionary for all terms that match that string. If a match is found, you explore further by iterating through that List, if not, you know that you can discard the buffer and continue.
Because the Dictionary matches strings based on hash, the search is non-linear and ideal for handling a large number of possible matches.
I'm using this methodology to allow instantaneous (<1ms) searching of every airfield in the US by name, state, city, FAA code, etc. There are 13K airfields in the US, and I've created a map of about 300K permutations (again, a Dictionary with prefixes of varying lengths, each corresponding to a list of matches).
For example, Phoenix, Arizona's main airfield is called Sky Harbor with the short ID of KPHX. I store:
KP
KPH
KPHX
Ph
Pho
Phoe
Ar
Ari
Ariz
Sk
Sky
Ha
Har
Harb
There is a cost in terms of memory usage, but string interning probably reduces this somewhat and the resulting speed justifies the memory usage on data sets of this size. Searching happens as the user types and is so fast that I have actually introduced an artificial delay to smooth out the experience.
Send me a message if you have the need to dig into these methodologies.
Extension method for elegance
(arguably "prettier" at the call level)
I'll implement an extension method that allows you to call your implementation directly on the original string as seen here.
value = value.Remove(y);
// or
value = value.Remove("Hello", "You");
// effectively
string value = "HelloGoodByeSeeYouLater".Remove("Hello", "You");
The extension method is callable on any string value in fact, and therefore easily reusable.
Implementation of Extension method:
I'm going to wrap your own implementation (shown in your question) in an extension method for pretty or elegant points and also employ the params keyword to provide some flexbility passing the arguments. You can substitute somebody else's faster implementation body into this method.
static class EXTENSIONS {
static public string Remove(this string thisString, params string[] arrItems) {
// Whatever implementation you like:
if (thisString == null)
return null;
var temp = thisString;
foreach(string x in arrItems)
temp = temp.Replace(x, "");
return temp;
}
}
That's the brightest idea I can come up with right now that nobody else has touched on.

Categories