Background
We currently have an excel-based system for the creation of specifications for a sound and lighting rental company.
Part of this is a column in the excel sheet called 'autospec', which is made up of Excel formulas for individual stock items ( e.g., you specify a loudspeaker, and then the formulas in the autospec column calculate the cables you need and add them onto the specification automatically.
sample formula
10m microphone cable might have a formula like the following:
=IF([loudspeakers]>0,[loudspeakers]*2,0)+([mixing desk]*4)
We're now moving over to a proper database with a C# front end.
Question
What I would like is to be able to store the autospec formulas for each stock item in a table, and when the user specs an item the front end, the program should find the relevant formula, execute it, and change the spec quantity as appropriate.
Bottom line: I need to execute code contained within a string.
Am I going about this the wrong way?
Is there a better way?
I asked a similar question once: How can I evaluate a C# expression dynamically?
You could probably use that to evaluate those expressions. This not really a safe thing to do, unless you can guarantee nobody will be adding junk (read: evil code) to your database.
If you can break down the formulas into "families", such that each entry in your formula column is a member of a small (5-10) set of formulas with just different parameters, you could try something like this:
[ItemTable]<-[ItemFormulaParameters(param1, param2, param3)]->[FormulaTable(name)]
And have a factory method for instantiating the formula objects by name. Each such formula object has a "calculate(param1, param2, param3)" property...
Either build you're own code interprenter or build a 'dynamic rule engine' that creates some binary representation of c# rules that can be executed by a 'rule engine' you'd have to write.
Related
I'm writing a program to process a cube using Microsoft Analysis, and I want
bit values to be shown as True/False whenever I browse the cube (lets say in Excel).
I've managed to show null values as some string using 'Unknown', but I can't figure how to do the same with the actual values.
P.S. I do not want to convert it by using a CASE query in the DSV table creation. I want it to be only at the Dimension level.
Thanks.
You might "not want" to rely on a CASE statement in the DSV, but SSAS doesn't give you any other options to achieve what you want. So hold your nose and start coding...
FWIW - I prefer a SQL View (rather than calculations in a DSV) for more re-use and easier testing.
I want to do a super fast geocode lookup, returning co-ordinates for an input of Town, City or Country. My knowledge is basic but from what I understand writing it in C is a good start. I was thinking it makes sense to have a tree structure like this:
England
Kent
Orpington
Chatam
Rochester
Dover
Edenbridge
Wiltshire
Swindon
Malmsbury
In my file / database I will have the co-ordinate and the town/city name. If give my program the name "Kent" I want a program that can return me the co-ordinate assoaited with "Kent" in the fastest way possible
Should I store the data in a binary file or a SQL database for performance reasons?
What is the best method of searching this data? Perhaps binary tree searching?
How should the data be stored? perhaps?
Here's a little advice, but not much more than that:
If you want to find places by name, or name prefix, as you indicate that you wish to, then you would be ill-advised to set up a data structure which stores the data in a hierarchy of country, region, town as you suggest you might. If you have an operation that dominates the use of your data structure you are generally best picking the data structure to suit the operation.
In this case an alphabetical list of places would be more suited to your queries. To each place not at the topmost level you would want to add some kind of reference to the name of its 'parent'. If you have an alphabetical list of places you might also want to consider an index , perhaps one which points directly to the first place in the list which starts with each letter of the alphabet.
As you describe your problem it seems to have much more in common with storing words in a dictionary (I mean the sort of thing in which you look up words rather than any particular collection data-type in any specific programming language which goes under the same name) than with most of what goes under the guise of geo-coding.
My guess would be that a gazetteer including the names of all the world's towns, cities, regions and countries (and their coordinates) which have a population over, say, 1000, could be stored in a very simple data structure (basically a list) with an index or two for rapid location of the first A place-name, the first B, and so on. With a little compression you could probably hold this in the memory of most modern desktop PCs.
I think the best advice I can give is to use whatever language you are familiar with to get the results you want. Worry about performance once your code works. Then you can look at translating very specific pieces of functionality into C or C++ one at a time until you have the results you want.
You should not worry about how the information is stored, except not to duplicate data.
You should create one or more indices for the data. The indicies are associative arrays / maps data structures that contain a key (the item you want to search) and a value (such as the record and other information associated with the key). This will enable you with fast lookups without altering your data for each type of search.
On the other hand, your case is an excellent fit for a data base. I suggest you let the database manager your data (such as efficient lookups). After all, that is what they live for.
See also: At what point is it worth using a database?
I'm looking for some suggestions on better approaches to handling a scenario with reading a file in C#; the specific scenario is something that most people wouldn't be familiar with unless you are involved in health care, so I'm going to give a quick explanation first.
I work for a health plan, and we receive claims from doctors in several ways (EDI, paper, etc.). The paper form for standard medical claims is the "HCFA" or "CMS 1500" form. Some of our contracted doctors use software that allows their claims to be generated and saved in a HCFA "layout", but in a text file (so, you could think of it like being the paper form, but without the background/boxes/etc). I've attached an image of a dummy claim file that shows what this would look like.
The claim information is currently extracted from the text files and converted to XML. The whole process works ok, but I'd like to make it better and easier to maintain. There is one major challenge that applies to the scenario: each doctor's office may submit these text files to us in slightly different layouts. Meaning, Doctor A might have the patient's name on line 10, starting at character 3, while Doctor B might send a file where the name starts on line 11 at character 4, and so on. Yes, what we should be doing is enforcing a standard layout that must be adhered to by any doctors that wish to submit in this manner. However, management said that we (the developers) had to handle the different possibilities ourselves and that we may not ask them to do anything special, as they want to maintain good relationships.
Currently, there is a "mapping table" set up with one row for each different doctor's office. The table has columns for each field (e.g. patient name, Member ID number, date of birth etc). Each of these gets a value based on the first file that we received from the doctor (we manually set up the map). So, the column PATIENT_NAME might be defined in the mapping table as "10,3,25" meaning that the name starts on line 10, at character 3, and can be up to 25 characters long. This has been a painful process, both in terms of (a) creating the map for each doctor - it is tedious, and (b) maintainability, as they sometimes suddenly change their layout and then we have to remap the whole thing for that doctor.
The file is read in, line by line, and each line added to a
List<string>
Once this is done, we do the following, where we get the map data and read through the list of file lines and get the field values (recall that each mapped field is a value like "10,3,25" (without the quotes)):
ClaimMap M = ClaimMap.GetMapForDoctor(17);
List<HCFA_Claim> ClaimSet = new List<HCFA_Claim>();
foreach (List<string> cl in Claims) //Claims is List<List<string>>, where we have a List<string> for each claim in the text file (it can have more than one, and the file is split up into separate claims earlier in the process)
{
HCFA_Claim c = new HCFA_Claim();
c.Patient = new Patient();
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
//...and so on...
ClaimSet.Add(c);
}
Sorry this is so long...but I felt that some background/explanation was necessary. Are there any better/more creative ways of doing something like this?
Given the lack of standardization, I think your current solution although not ideal may be the best you can do. Given this situation, I would at least isolate concerns e.g. file read, file parsing, file conversion to standard xml, mapping table access etc. to simple components employing obvious patterns e.g. DI, strategies, factories, repositories etc. where needed to decouple the system from the underlying dependency on the mapping table and current parsing algorithms.
You need to work on the DRY (Don't Repeat Yourself) principle by separating concerns.
For example, the code you posted appears to have an explicit knowledge of:
how to parse the claim map, and
how to use the claim map to parse a list of claims.
So there are at least two responsibilities directly relegated to this one method. I'd recommend changing your ClaimMap class to be more representative of what it's actually supposed to represent:
public class ClaimMap
{
public ClaimMapField Name{get;set;}
...
}
public class ClaimMapField
{
public int StartingLine{get;set;}
// I would have the parser subtract one when creating this, to make it 0-based.
public int StartingCharacter{get;set;}
public int MaxLength{get;set;}
}
Note that the ClaimMapField represents in code what you spent considerable time explaining in English. This reduces the need for lengthy documentation. Now all the M.Name.Split calls can actually be consolidated into a single method that knows how to create ClaimMapFields out of the original text file. If you ever need to change the way your ClaimMaps are represented in the text file, you only have to change one point in code.
Now your code could look more like this:
c.Patient.FullName = cl[map.Name.StartingLine].Substring(map.Name.StartingCharacter, map.Name.MaxLength).Trim();
c.Patient.Address = cl[map.Address.StartingLine].Substring(map.Address.StartingCharacter, map.Address.MaxLength).Trim();
...
But wait, there's more! Any time you see repetition in your code, that's a code smell. Why not extract out a method here:
public string ParseMapField(ClaimMapField field, List<string> claim)
{
return claim[field.StartingLine].Substring(field.StartingCharacter, field.MaxLength).Trim();
}
Now your code can look more like this:
HCFA_Claim c = new HCFA_Claim
{
Patient = new Patient
{
FullName = ParseMapField(map.Name, cl),
Address = ParseMapField(map.Address, cl),
}
};
By breaking the code up into smaller logical pieces, you can see how each piece becomes very easy to understand and validate visually. You greatly reduce the risk of copy/paste errors, and when there is a bug or a new requirement, you typically only have to change one place in code instead of every line.
If you are only getting unstructured text, you have to parse it. If the text content changes you have to fix your parser. There's no way around this. You could probably find a 3rd party application to do some kind of visual parsing where you highlight the string of text you want and it does all the substring'ing for you but still unstructured text == parsing == fragile. A visual parser would at least make it easier to see mistakes/changed layouts and fix them.
As for parsing it yourself, I'm not sure about the line-by-line approach. What if something you're looking for spans multiple lines? You could bring the whole thing in a single string and use IndexOf to substring that with different indices for each piece of data you're looking for.
You could always use RegEx instead of Substring if you know how to do that.
While the basic approach your taking seems appropriate for your situation, there are definitely ways you could clean up the code to make it easier to read and maintain. By separating out the functionality that you're doing all within your main loop, you could change this:
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
to something like this:
var parser = new FormParser(cl, M);
c.PatientFullName = FormParser.GetName();
c.PatientAddress = FormParser.GetAddress();
// etc
So, in your new class, FormParser, you pass the List that represents your form and the claim map for the provider into the constructor. You then have a getter for each property on the form. Inside that getter, you perform your parsing/substring logic like you're doing now. Like I said, you're not really changing the method by which your doing it, but it certainly would be easier to read and maintain and might reduce your overall stress level.
I am thinking a library already exists for this, but I need allow my users to create a numbering format for their documents.
For example, let's say we have an RFI from and the user has a specific format the numbering sequence needs to be in. A typical RFI number looks like this for their system: R0000100. The next RFI in line would be R0000101.
Before I set out to creating a formatting engine for numbers such as these, does something already exist that can accommodate this?
Update:
I failed to save the edit to this question. Anyway, I also want to give the users the ability to create their own formats. So, I may have a form where they can input the format: R####### And also allow them to specify the starting integer: in the case 100. Also, I may want to allow them to specify how they want to increment. maybe only by 100s. So the next number may be R0000200. I know this may sound ridiculous, but you never know. That is why I asked if something like this already exists.
If you keep value and format separated, you won't need a library or such a thing.
The numbers would be simple, say, integers i, i.e. 100, 101, 102, that you manage/store however you see fit. The formatting part would simply be a matter of R + i.ToString("0000000"), or if you want to have the format as a string literal string.Format("R{0:0000000}", i).
I know, this might only be an example, but as your question stands, the formatting options, that .NET provides out of the box seem to suffice.
The incrementing of identity field values is most often handled in an RDBMS-style database. This comes with a few benefits, such as built-in concurrency handling. If you want to generate the values yourself, a simple class to get the last-issued value and increment by one would be very easy to create. Make it thread-safe so you don't get any duplicates or gaps and you'll be good to go.
There is a .NET opensource library called NPOI that allows you to manipulate Excel files. Unfortunately, their ShiftRows function does not adjust any cell references in formulas.
Therefore, I need to create a regex pattern to update them. Take for example a cell containing the following formula:
=(B7/C9) * (A10-B4)
I would like to bump any row references by 1 thus becoming
=(B8/C10) * (A11-B5)
Basically, I just need a pattern that will extract the numbers out into a "MatchCollection". I can do the rest.
Can anyone help?
Thanks.
Take a look at my answer to another question: Which regular expression is able to select excel column names in a formula in C#?
In that answer I match formulas and include code to increment the cell number (and column naming). Also, do a search on Excel column naming here and you'll find other ways to get the names generated or incremented. I probably could shorten the IncrementColumn MatchEvaluator's code using one of those methods but that's what I came up with at the time.
You should be very careful using regular expressions for this purpose.
Excel formulas can become very complex, especially with user-defined functions or when functions apply to ranges.
NPOI contains a Formula evaluation library and I think you should have a look at this as well - especially if the spreadsheets are out of your control.