I'm looking for some suggestions on better approaches to handling a scenario with reading a file in C#; the specific scenario is something that most people wouldn't be familiar with unless you are involved in health care, so I'm going to give a quick explanation first.
I work for a health plan, and we receive claims from doctors in several ways (EDI, paper, etc.). The paper form for standard medical claims is the "HCFA" or "CMS 1500" form. Some of our contracted doctors use software that allows their claims to be generated and saved in a HCFA "layout", but in a text file (so, you could think of it like being the paper form, but without the background/boxes/etc). I've attached an image of a dummy claim file that shows what this would look like.
The claim information is currently extracted from the text files and converted to XML. The whole process works ok, but I'd like to make it better and easier to maintain. There is one major challenge that applies to the scenario: each doctor's office may submit these text files to us in slightly different layouts. Meaning, Doctor A might have the patient's name on line 10, starting at character 3, while Doctor B might send a file where the name starts on line 11 at character 4, and so on. Yes, what we should be doing is enforcing a standard layout that must be adhered to by any doctors that wish to submit in this manner. However, management said that we (the developers) had to handle the different possibilities ourselves and that we may not ask them to do anything special, as they want to maintain good relationships.
Currently, there is a "mapping table" set up with one row for each different doctor's office. The table has columns for each field (e.g. patient name, Member ID number, date of birth etc). Each of these gets a value based on the first file that we received from the doctor (we manually set up the map). So, the column PATIENT_NAME might be defined in the mapping table as "10,3,25" meaning that the name starts on line 10, at character 3, and can be up to 25 characters long. This has been a painful process, both in terms of (a) creating the map for each doctor - it is tedious, and (b) maintainability, as they sometimes suddenly change their layout and then we have to remap the whole thing for that doctor.
The file is read in, line by line, and each line added to a
List<string>
Once this is done, we do the following, where we get the map data and read through the list of file lines and get the field values (recall that each mapped field is a value like "10,3,25" (without the quotes)):
ClaimMap M = ClaimMap.GetMapForDoctor(17);
List<HCFA_Claim> ClaimSet = new List<HCFA_Claim>();
foreach (List<string> cl in Claims) //Claims is List<List<string>>, where we have a List<string> for each claim in the text file (it can have more than one, and the file is split up into separate claims earlier in the process)
{
HCFA_Claim c = new HCFA_Claim();
c.Patient = new Patient();
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
//...and so on...
ClaimSet.Add(c);
}
Sorry this is so long...but I felt that some background/explanation was necessary. Are there any better/more creative ways of doing something like this?
Given the lack of standardization, I think your current solution although not ideal may be the best you can do. Given this situation, I would at least isolate concerns e.g. file read, file parsing, file conversion to standard xml, mapping table access etc. to simple components employing obvious patterns e.g. DI, strategies, factories, repositories etc. where needed to decouple the system from the underlying dependency on the mapping table and current parsing algorithms.
You need to work on the DRY (Don't Repeat Yourself) principle by separating concerns.
For example, the code you posted appears to have an explicit knowledge of:
how to parse the claim map, and
how to use the claim map to parse a list of claims.
So there are at least two responsibilities directly relegated to this one method. I'd recommend changing your ClaimMap class to be more representative of what it's actually supposed to represent:
public class ClaimMap
{
public ClaimMapField Name{get;set;}
...
}
public class ClaimMapField
{
public int StartingLine{get;set;}
// I would have the parser subtract one when creating this, to make it 0-based.
public int StartingCharacter{get;set;}
public int MaxLength{get;set;}
}
Note that the ClaimMapField represents in code what you spent considerable time explaining in English. This reduces the need for lengthy documentation. Now all the M.Name.Split calls can actually be consolidated into a single method that knows how to create ClaimMapFields out of the original text file. If you ever need to change the way your ClaimMaps are represented in the text file, you only have to change one point in code.
Now your code could look more like this:
c.Patient.FullName = cl[map.Name.StartingLine].Substring(map.Name.StartingCharacter, map.Name.MaxLength).Trim();
c.Patient.Address = cl[map.Address.StartingLine].Substring(map.Address.StartingCharacter, map.Address.MaxLength).Trim();
...
But wait, there's more! Any time you see repetition in your code, that's a code smell. Why not extract out a method here:
public string ParseMapField(ClaimMapField field, List<string> claim)
{
return claim[field.StartingLine].Substring(field.StartingCharacter, field.MaxLength).Trim();
}
Now your code can look more like this:
HCFA_Claim c = new HCFA_Claim
{
Patient = new Patient
{
FullName = ParseMapField(map.Name, cl),
Address = ParseMapField(map.Address, cl),
}
};
By breaking the code up into smaller logical pieces, you can see how each piece becomes very easy to understand and validate visually. You greatly reduce the risk of copy/paste errors, and when there is a bug or a new requirement, you typically only have to change one place in code instead of every line.
If you are only getting unstructured text, you have to parse it. If the text content changes you have to fix your parser. There's no way around this. You could probably find a 3rd party application to do some kind of visual parsing where you highlight the string of text you want and it does all the substring'ing for you but still unstructured text == parsing == fragile. A visual parser would at least make it easier to see mistakes/changed layouts and fix them.
As for parsing it yourself, I'm not sure about the line-by-line approach. What if something you're looking for spans multiple lines? You could bring the whole thing in a single string and use IndexOf to substring that with different indices for each piece of data you're looking for.
You could always use RegEx instead of Substring if you know how to do that.
While the basic approach your taking seems appropriate for your situation, there are definitely ways you could clean up the code to make it easier to read and maintain. By separating out the functionality that you're doing all within your main loop, you could change this:
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
to something like this:
var parser = new FormParser(cl, M);
c.PatientFullName = FormParser.GetName();
c.PatientAddress = FormParser.GetAddress();
// etc
So, in your new class, FormParser, you pass the List that represents your form and the claim map for the provider into the constructor. You then have a getter for each property on the form. Inside that getter, you perform your parsing/substring logic like you're doing now. Like I said, you're not really changing the method by which your doing it, but it certainly would be easier to read and maintain and might reduce your overall stress level.
Related
I need guidance, someone to point me in the right direction. As the tittle says, I need to save information to a file: Date, string, integer and an array of integers. And I also need to be able to access that information later, when an user wants to review it.
Optional: File is plain text and I can directly check it and it is understandable.
Bonus points if chosen method can be "easily" converted to working with a database in the future instead of individual files.
I'm pretty new to C# and what I've found so far is that I should turn the array into a string with separators.
So, what'd you guys suggest?
// JSON.Net
string json = JsonConvert.SerializeObject(objOrArray);
File.WriteAllText(path, json);
// (note: can also use File.Create etc if don't need the string in memory)
or...
using(var file = File.Create(path)) { // protobuf-net
Serializer.Serialize(file, objOrArray);
}
The first is readable; the second will be smaller. Both will cope fine with "Date, string, integer and an array of integers", or an array of such objects. Protobuf-net would require adding some attributes to help it, but really simple.
As for working with a database as columns... the array of integers is the glitch there, because most databases don't support "array of integers" as a column type. I'd say "separation of concerns" - have a separate model for DB persistence. If you are using the database purely to store documents, then: pretty much every DB will support CLOB and BLOB data, so either is usable. Many databases now have inbuilt JSON support (helper methods, etc), which might make JSON as a CLOB more tempting.
I would probably serialize this to json and save it somewhere. Json.Net is a very popular way.
The advantage of this is also creating a class that can be later used to work with an Object-Relational Mapper.
var userInfo = new UserInfoModel();
// write the data (overwrites)
using (var stream = new StreamWriter(#"path/to/your/file.json", append: false))
{
stream.Write(JsonConvert.SerializeObject(userInfo));
}
//read the data
using (var stream = new StreamReader(#"path/to/your/file.json"))
{
userInfo = JsonConvert.DeserializeObject<UserInfoModel>(stream.ReadToEnd());
}
public class UserInfoModel
{
public DateTime Date { get; set; }
// etc.
}
for the Plaintext File you're right.
Use 1 Line for each Entry:
Date
string
Integer
Array of Integer
If you read the File in your code you can easily seperate them by reading line to line.
Make a string with a specific Seperator out of the Array:
[1,2,3] -> "1,2,3"
When you read the line you can Split the String by "," and gets a Array of Strings. Parse each Entry to int into an Array of Int with the same length.
How to read and write the File get a look at Easiest way to read from and write to files
If you really wants the switch to a database at a point, try a JSON Format for your File. It is easy to handle and there are some good Plugins to work with.
Mfg
Henne
The way I got started with C# is via the game Space Engineers from the Steam Platform, the Mods need to save a file Locally (%AppData%\Local\Temp\SpaceEngineers\ or %AppData%\Roaming\SpaceEngineers\Storage\) for various settings, and their logging is similar to what #H. Sandberg mentioned (line by line, perhaps a separator to parse with later), the upside to this is that it's easy to retrieve, easy to append, easy to overwrite, and I'm pretty sure it's even possible to retrieve File Size, which when combined with File Deletion and File Creation can prevent runaway File Sizes as this allows you to set an Upper Limit to check against, allowing you to run it on a Server with minimal impact (probably best to include a minimum Date filter {make sure X is at least Y days old before deleting it for being over Z Bytes} to prevent Debugging Data Loss {"Why was it over that limit?"})
As far as the actual Code behind the idea, I'm approximately at the same Skill Level as the OP, which is to say; Rookie, but I would advise looking at the Coding in the Space Engineers Mods for some Samples (plus it's not half bad for a Beta Game), as they are almost all written in C#. Also, the Programmable Blocks compile in C# as well, so you'll be able to use that to both assist in learning C# and reinforce and utilize what you already know (although certain C# commands aren't allowed for security reasons, utilizing the Mod API you'll have more flexibility to do things such as Creating/Maintaining Log Files, Retrieving/Modifying Object Properties, etc.), You are even capable of printing Text to various in Game Text Monitors.
I apologise if my Syntax needs some work, and I'm sorry I am not currently capable of just whipping up some Code to solve your issue, but I do know
using System;
Console.WriteLine("Hello World");
so at least it's not a total loss, but my example Code likely won't compile, since it's likely missing things like: an Output Location, perhaps an API reference or two, and probably a few other settings. Like I said, I'm New, but that is a valid C# Command, I know I got that part correct.
Edit: here's a better attempt:
using System;
class Test
{
static void Main()
{
string a = "Hello Hal, ";
string b = "Please open the Airlock Doors.";
string c = "I'm sorry Dave, "
string d = "I'm afraid I can't do that."
Console.WriteLine(a + b);
Console.WriteLine(c + d);
Console.Read();
}
}
This:
"Hello Hal, Please open the Airlock Doors."
"I'm sorry Dave, I'm afraid I can't do that."
Should be the result. (the "Quotation Marks" shouldn't appear in the readout {the last Code Block}, that's simply to improve readability)
I am looking for the best data structure for the following case:
In my case I will have thousands of strings, however for this example I am gonna use two for obvious reasons. So let's say I have the strings "Water" and "Walter", what I need is when the letter "W" is entered both strings to be found, and when "Wat" is entered "Water" to be the only result. I did a research however I am still not quite sure which is the correct data structure for this case and I don't want to implement it if I am not sure as this will waste time. So basically what I am thinking right now is either "Trie" or "Suffix Tree". It seems that the "Trie" will do the trick but as I said I need to be sure. Additionally the implementation should not be a problem so I just need to know the correct structure. Also feel free to let me know if there is a better choice. As you can guess normal structures such as Dictionary/MultiDictionary would not work as that will be a memory killer. I am also planning to implement cache to limit the memory consumption. I am sorry there is no code but I hope I will get a answer. Thank you in advance.
You should user Trie. Tries are the foundation for one of the fastest known sorting algorithms (burstsort), it is also used for spell checking, and is used in applications that use text completion. You can see details here.
Practically, if you want to do auto suggest, then storing upto 3-4 chars should suffice.
I mean suggest as and when user types "a" or "ab" or "abc" and the moment he types "abcd" or more characters, you can use map.keys starting with "abcd" using c# language support lamda expressions.
Hence, I suggest, create a map like:
Map<char, <Map<char, Map<char, Set<string>>>>> map;
So, if user enters "a", you look for map[a] and finds all children.
I am thinking a library already exists for this, but I need allow my users to create a numbering format for their documents.
For example, let's say we have an RFI from and the user has a specific format the numbering sequence needs to be in. A typical RFI number looks like this for their system: R0000100. The next RFI in line would be R0000101.
Before I set out to creating a formatting engine for numbers such as these, does something already exist that can accommodate this?
Update:
I failed to save the edit to this question. Anyway, I also want to give the users the ability to create their own formats. So, I may have a form where they can input the format: R####### And also allow them to specify the starting integer: in the case 100. Also, I may want to allow them to specify how they want to increment. maybe only by 100s. So the next number may be R0000200. I know this may sound ridiculous, but you never know. That is why I asked if something like this already exists.
If you keep value and format separated, you won't need a library or such a thing.
The numbers would be simple, say, integers i, i.e. 100, 101, 102, that you manage/store however you see fit. The formatting part would simply be a matter of R + i.ToString("0000000"), or if you want to have the format as a string literal string.Format("R{0:0000000}", i).
I know, this might only be an example, but as your question stands, the formatting options, that .NET provides out of the box seem to suffice.
The incrementing of identity field values is most often handled in an RDBMS-style database. This comes with a few benefits, such as built-in concurrency handling. If you want to generate the values yourself, a simple class to get the last-issued value and increment by one would be very easy to create. Make it thread-safe so you don't get any duplicates or gaps and you'll be good to go.
for part of a small assignment I have, i've been asked to create an array to store names and addresses taken from input that the user gives and to be able to later delete a name and address from the array.
Any help or links to helping me understand how to achieve this would be highly appreaciated, thanks.
EDIT - The array is to be set up like an address book, and when printed to the screen it displays like so: "Bloggs, Joe"
It must be surname then forename. I know how to acquire and store the information the user will give, being their names and addresses, but I am stuck on how to add this into an array. The array doesn't have to be infinite, as I am supposed to allocate the array whatever size I wish.
At the start of the program it will be part of, the user will be given a menu, and they can choose to add a record, delete a record or print the book to the screen. So i am meant to be using methods where suitable.
Well, to start with, an array is the wrong data structure to use here.
Arrays are always a fixed size - whereas you want to be able to add elements and later remove them. Assuming you're using C# 2 or higher, you should probably use a List<T>.
Now, the next thing is to work out what T should be. It sounds like you want to store details of people - so you should create a Person class (or perhaps Contact) to encapsulate the name and address... that way you can have a List<Person>.
The next task is probably to work out how to ask the user for input and convert that input into an instance of Person.
Basically, break the task up into small bits - and then feel free to ask questions about any specific bits which you find hard.
I seem to remember this exact same assignment from my CS classes.
The prof wanted us to use linked lists. As John Skeet points out above, .NET has List<T>, which is basically a linked list (with the added feature of being able to be reference each item by index like an array)
You can use a Serializer for the saving part.
Check out the BinaryFormatter class and XmlSerializer.
XmlSerializer is preferred because the file is human-readable and efficiency is usually less important considering the type and purpose of your app.
Using XmlSerializer is as simple as:
var filename = "c:\....\addressbook.xml";
if (File.Exists(filename))
File.Delete(filename);
using (var sw = new StreamWriter(filename))
{
var xs = new XmlSerializer(typeof(List<Person>));
xs.Serialize(sw, myAddressBook);
}
I am working on an ASPX page that needs to handle multiple different kinds of data. I came up with a potentially ideal fashion to fetch the information I need, but am unsure if it is as good an idea as it feels. Basically, I need to filter a set into a subset, but which values I filter by will differ by circumstance. I constructed the following code snippet that seems to work fine.
List<string> lStr = new List<string>() {
"Category",
"Document Number", //Case 1 Only
"Document Title", //Case 1 Only
"Picture Title", //Case 2 Only
"Picture Number", //Case 2 Only
"Issue",
"Issue Date",
"Issue Title",
"Notes",
"High Priority" //Case 1 Only
};
AddControls(bigDataInput.Fields.OfType<FieldObject>().Where(x => lStr.Contains(x.Title)).ToArray());
bigDataInput is an object that has a property called Fields, which is a Collection of objects called FieldObject. I need to get a subset of these FieldObjects based on their title, and pass all of them into a method AddControls(params FieldObject[] fields). The issue is that which titles I need to filter by will differ based on the bigDataInput itself. There are only two case scenarios currently, and these are the fields I need to filter out.
bigDataInput Case 1: Category, Document Number, Document Title, Issue, Issue Date, Issue Title, Notes, High Priority
bigDataInput Case 2: Category, Picture Title, Picture Number, Issue, Issue Date, Issue Title, Notes
The bigDataInput will have additional fields besides those that I need. However, the field collection will only have one of the filtered fields if and only if I will actually need the field for that particular case. For example, Case 1 does not have the Picture Title and Picture Number fields, and Case 2 does not have the Document Number, Document Title, and High Priority fields. This restriction will also apply to all future cases, however many there might be.
I at first considered constructing a List based on the specific case scenario, but the switch case to build it could get quite large, and repetitive to a degree. That's when I came up with the idea for the above code snippet. But is this actually a good idea? Or is there a better method than this but much more concise than a potentially humongous switch case?
It's hard to wrap my head around your exact issue, but from what it got it might make sense to attack this a little differently using some design patterns. One that comes to mind that might make sense is the Strategy pattern. Basically, this is a way of encapsulating the algorithm (logic) from its host.
Wikipedia has an entry on the Strategy pattern here: http://en.wikipedia.org/wiki/Strategy_pattern
I'm thinking about a flow somewhat like this:
You have an interface called IDataInputTransformer with 2 methods: bool AcceptsInput(bigDataInput i) and FieldObject[] TransformInput(bigDataInput i)
The calling class has an IEnumerable of IDataInputTransformers that is set somehow -- either manually on instantiation or using dependency injection or something
Upon being passed a bigDataInput, the calling class iterates over each IDataInputTransformer and calls AcceptsInput with the bigDataInput. If the input is not accepted it just tries the next & the next (if none accept the input perhaps an exception is thrown or something). If it the IDataInputTransformer does accept the input then you can call TransformInput and get the FieldObject[] that will be passed to AddControls
You could take this further, but this is the basic idea. The benefits here are:
Readability -- it's really easy to
read
FinanceDepartmentInputTransformer.TransformInput(),
it's really hard to read a huge
switch
Able to change -- you can add new InputTransformers easily (and it won't ruin the readability or functionality)
Rules are isolated -- you can easily check the business rules in the AcceptsInput methods
Portability -- you could use this elsewhere too
Testability -- you can easily unit test this to make sure it works right