Import textfile with weird format in C#

Import textfile with weird format in C# - c#

I have this exported file of some weird (standard for this industry!) format, which I need to import into
our Database. The file basically looks like this:
DATRKAKT-START
KAKT_LKZ "D"
KAKT_DAT_STAMM "1042665"
DATRKAIB-START
KAIB_AZ "18831025"
KAIB_STATUS_FM 2
KAIB_KZ_WAE "E"
DATRKAIB-END
DATRKARP-START
KARP_MELD "831025"
KARP_ST_MELD "G"
DATRKARP-END
...
DATRKAKT-END
There are 56 sections with a total of 1963 different rows, so I'm really not into
creating 56 classes with 1963 properties... How would you handle this file
so that you can access some property like it were an object?
Datrkaib.Kaib_Status_Fm
Datrkarp.karp_St_Meld

Unless your programming language lets you add methods to classes at runtime or lets classes respond to calls to undefined methods, there's no way you can do this. The thing is, even if C# did let you do this, you would lose type safety and Intellisense help anyway (presumably among the reasons for wanting it to work like that), so why not just go ahead and read it into some data structure? My inclination would be a hash which can contain values or other hashes, so you'd get calls like (VB):
Datrkakt("Lkz")
Datrkakt("Dat_Stam")
Datrkakt("Kaib")("Az")
Datrkakt("Kaib")("Status_Fm")
Or if you know all the data items are uniquely named as in your example, just use one hash:
Datr("Kakt_Lkz")
Datr("Kakt_Dat_Stam")
Datr("Kaib_Az")
Datr("Kaib_Status_Fm")
You can get back Intellisense help by creating an enum of all the data item names and getting:
Datr(DatrItem.KAKT_LKZ)
Datr(DatrIrem.KAIB_STATUS_FM)

It looks like structured data - I'd run search and replace and convert it to a simple xml. and then import.
The if you want to generate a code file off of it - consider codesmith - I think it can do this.

I'd go with a List <name, list> of various object, that can be a tuple <name, value> or a named list of object.

There isn't that will automatically do this for you.
I would create a class containing all the appropriate properties (say DatrDocument), and create a DatrReader class (similar idea to the XmlDocument/XmlReader classes).
The DatrReader will need to read the contents of the file or stream, and parse it into a DatrDocument.
You may also want to write a DatrWriter class which will take a DatrDocument and write it to a stream.

Related

What's the difference between Entity.Attributes and Entity.FormattedValues?

I'm learning how to write custom workflows and am trying to figure out where and in which format all the values I need are stored. I have noticed that I can access Entity instance data in both the Attributes and FormattedValues properties. How do I know when to use which one?
I have noticed MSDN's remark "Entity formatted values are only available on a retrieve operation, not on an update operation.".
For testing I've made two foreach-blocks iterating through both collections. Attributes gives me 65 lines and FormattedValues gives me 39. I can see that, yes, the output from FormattedValues is indeed formatted.
For example, where Attributes gives the output "Microsoft.Xrm.Sdk.OptionSetValue", FormattedValues gives me a string with the actual value.
Which values/attributes are generally excluded from the FormattedValues collection and why?

I'm not 100% sure about this, but the formatted values are the values you will be able to see on the form. In that list you will be able to find money types with the $ symbol, or the labels of the option sets. A text field shouldn't be shown since is already human-readable.
https://community.dynamics.com/crm/b/crmmitchmilam/archive/2013/04/18/crm-sdk-nugget-entity-formattedvalues-property.aspx
Refer to this article to know a little bit more about it. I rareley using that attribute list since the data is in string format. I found it really useful to retrieve the OprionSet lables.

After a quick check it'd appear that the difference between an attribute and a formatted value is that the former to the actual value stored in the DB (or at least the value that was stored there at the occasion of fetching it) while the latter serves what's shown to the user.
I haven't used formatted values but until proven otherwise, I'd say that an entity's attribute will provide you with the typed value that the regarded field is based on (be that int, DateTime and such), whereas its formatted value is the rendered, stringified representation of that value (being dependent on e.g. what form you're referring to, what language etc.)
By that logic, I'd expect the set of formatted values to be a subset to the set of attributes. Also, it should be constituted of exclusively String typed values, while the counterpart is a member of the type conversion table.
An example of the difference I can think of is an option set called picky with the currently selected option named "hazaa" and the ID of 1234. The below sample is written by heart so feel free to correct. It exemplifies the point, though: plainValue will be an integer equal to 1234, while formattedValue will be "hazaa".
int plainValue = (int)entity["picky"];
String formattedValue = (String)entity.FormattedValues["picky"];
I'd say that the attributive approach is more reliable as it'll render the actual values, while the alternative can lead to unexpected outcome. However, there's a certain convenience to it, I must add.
Personally, I'd recommend to look into the method GetAttributeValue<T>(String) or, as every cocky CRM developer would - have your own class with extension methods and use the method Get<T>(T,String) in there. That provides you with the sense of control and offers the best predictability and readability, IMAO.

Recursively print the keys and values of a Hashtable, that includes HashTables and ArrayLists

This might be stupid/overly complicated/near impossible, but...
I have a JSON file. I used a C# class (http://bit.ly/1bl73Ji) to parse the file into a HashTable.
So, I now have a HashTable. This HashTable has Keys and Values. But, some of these Values are ArrayLists or HashTables. Sometimes these ArrayLists contain HashTables, sometimes the HashTables contain Hashtables, and so on and so forth...
I'm basically just trying to print these keys and values into a file, just using simple indentation to differentiate, e.g.
dateLastActivity
2013-07-01T13:50:51.746Z
members
memberType
normal
avatarHash
5f9a6a60b6e669e81ed3a886ae
confirmed
True
status
active
url
www.awebsite.com
id
4fcde962a1b057c46607c1
initials
AP
username
ap
fullName
A Person
bio
Software developer
I need to recursivley go through the Hashtable, checking if the value is an ArrayList or HashTable, and keep checking and checking until it's just a string value. I've had a crack at it but I just can't seem to wrap my head around the recursive-ness.
Can anybody help? Even if somebody can think of a better way or if I should just abandon hope, I'd like to hear it!
Thanks in advance!

There is a Library for that
I honestly would just use Json.NET, that is what I use to deal with my Json in pretty much all of my projects, it is kept up to date and works extremely well.
It also is good if you want a file format that is easy to humanread, to serialize (serialize means to transform into a data representation) your data and have a human edit it later on. It comes with the ability to Serialize pretty much most object out-of-the-box.
For most objects all you have to do is this statement to serialize your data into a string.
string s = JsonConverter.Serialize(theObject);
//assume you have a write file method
WriteFile(filename, s);
that's it to serialize in most cases and to deserialize is just as easy
string content = ReadFile(filename);
MyObject obj = JsonConverter.Deserialize(content);
the library also includes support for anonymous objects, allows you to specify how your object is parsed with attributes, and a lot of other cool features, while also being faster then the built in .NET version

Pseudo-code answer:
main(){
recurse(0, myHashTable)
}
recurse(int level, Structure tableOrArray){
for element x in tableOrArray{
if(x is a string) print (level)*indent + x;
if(x is not a string) recurse(level + 1, x)
}
}

parse hstore column in .net

I was wondering what was the best way to retrieve and handle hstore data in .net.
If I do a basic query, then it'll output a string formatted that way :
"key1" => "value1", "key2" => "value2"
Looks alike KeyValuePair that I today parse like that :
SimpleJson.SimpleJson
.DeserializeObject<Dictionary<string, string>> ("{" + tags.Replace ("=>", ":") + "}");
I could do it manually with something like :
split first with ","
then loop and split with "=>"
then extract left to key and right to value
But then, what if there is a "," inside a value or a full element inside a value, should I do a recursive parsing? and there goes on all the questions about parsing a string :)
Do you see a better way than the trick of Json ?

You have a few options. I can't say whether any are better than what you have now. My recommendation actually would be to start beta testing 9.3 as soon as it is in beta and look at moving your code over there. 9.3's hstore has a cast to json, so you can just do it in the db. This may be a good reason to plan to be an early adopter. In essence in 9.3, you will just be able to:
SELECT my_hstore::json FROM mytable;
A second possibility is that you could create a structure in memory in PostgreSQL (for existing supported versions in 9.1 and 9.2) to write your own cast to json. For insight on how to do it, see Andrew's blog below.
A third possibility is to use existing functions of XML and write an XML cast. This is fairly easy. With a simple function you could easily do a:
SELECT my_hstore::xml FROM mytable;
The final possibility is to do string parsing in .net. This is the area I have the least knowledge of, but I would also suggest it is the last choice in large part because with the others, you have direct access to the hstore data interfaces, and also because the functions you would be using all have relatively tight contracts and a lot of other users.

Better method of handling/reading these files (HCFA medical claim form)

I'm looking for some suggestions on better approaches to handling a scenario with reading a file in C#; the specific scenario is something that most people wouldn't be familiar with unless you are involved in health care, so I'm going to give a quick explanation first.
I work for a health plan, and we receive claims from doctors in several ways (EDI, paper, etc.). The paper form for standard medical claims is the "HCFA" or "CMS 1500" form. Some of our contracted doctors use software that allows their claims to be generated and saved in a HCFA "layout", but in a text file (so, you could think of it like being the paper form, but without the background/boxes/etc). I've attached an image of a dummy claim file that shows what this would look like.
The claim information is currently extracted from the text files and converted to XML. The whole process works ok, but I'd like to make it better and easier to maintain. There is one major challenge that applies to the scenario: each doctor's office may submit these text files to us in slightly different layouts. Meaning, Doctor A might have the patient's name on line 10, starting at character 3, while Doctor B might send a file where the name starts on line 11 at character 4, and so on. Yes, what we should be doing is enforcing a standard layout that must be adhered to by any doctors that wish to submit in this manner. However, management said that we (the developers) had to handle the different possibilities ourselves and that we may not ask them to do anything special, as they want to maintain good relationships.
Currently, there is a "mapping table" set up with one row for each different doctor's office. The table has columns for each field (e.g. patient name, Member ID number, date of birth etc). Each of these gets a value based on the first file that we received from the doctor (we manually set up the map). So, the column PATIENT_NAME might be defined in the mapping table as "10,3,25" meaning that the name starts on line 10, at character 3, and can be up to 25 characters long. This has been a painful process, both in terms of (a) creating the map for each doctor - it is tedious, and (b) maintainability, as they sometimes suddenly change their layout and then we have to remap the whole thing for that doctor.
The file is read in, line by line, and each line added to a
List<string>
Once this is done, we do the following, where we get the map data and read through the list of file lines and get the field values (recall that each mapped field is a value like "10,3,25" (without the quotes)):
ClaimMap M = ClaimMap.GetMapForDoctor(17);
List<HCFA_Claim> ClaimSet = new List<HCFA_Claim>();
foreach (List<string> cl in Claims) //Claims is List<List<string>>, where we have a List<string> for each claim in the text file (it can have more than one, and the file is split up into separate claims earlier in the process)
{
HCFA_Claim c = new HCFA_Claim();
c.Patient = new Patient();
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
//...and so on...
ClaimSet.Add(c);
}
Sorry this is so long...but I felt that some background/explanation was necessary. Are there any better/more creative ways of doing something like this?

Given the lack of standardization, I think your current solution although not ideal may be the best you can do. Given this situation, I would at least isolate concerns e.g. file read, file parsing, file conversion to standard xml, mapping table access etc. to simple components employing obvious patterns e.g. DI, strategies, factories, repositories etc. where needed to decouple the system from the underlying dependency on the mapping table and current parsing algorithms.

You need to work on the DRY (Don't Repeat Yourself) principle by separating concerns.
For example, the code you posted appears to have an explicit knowledge of:
how to parse the claim map, and
how to use the claim map to parse a list of claims.
So there are at least two responsibilities directly relegated to this one method. I'd recommend changing your ClaimMap class to be more representative of what it's actually supposed to represent:
public class ClaimMap
{
public ClaimMapField Name{get;set;}
...
}
public class ClaimMapField
{
public int StartingLine{get;set;}
// I would have the parser subtract one when creating this, to make it 0-based.
public int StartingCharacter{get;set;}
public int MaxLength{get;set;}
}
Note that the ClaimMapField represents in code what you spent considerable time explaining in English. This reduces the need for lengthy documentation. Now all the M.Name.Split calls can actually be consolidated into a single method that knows how to create ClaimMapFields out of the original text file. If you ever need to change the way your ClaimMaps are represented in the text file, you only have to change one point in code.
Now your code could look more like this:
c.Patient.FullName = cl[map.Name.StartingLine].Substring(map.Name.StartingCharacter, map.Name.MaxLength).Trim();
c.Patient.Address = cl[map.Address.StartingLine].Substring(map.Address.StartingCharacter, map.Address.MaxLength).Trim();
...
But wait, there's more! Any time you see repetition in your code, that's a code smell. Why not extract out a method here:
public string ParseMapField(ClaimMapField field, List<string> claim)
{
return claim[field.StartingLine].Substring(field.StartingCharacter, field.MaxLength).Trim();
}
Now your code can look more like this:
HCFA_Claim c = new HCFA_Claim
{
Patient = new Patient
{
FullName = ParseMapField(map.Name, cl),
Address = ParseMapField(map.Address, cl),
}
};
By breaking the code up into smaller logical pieces, you can see how each piece becomes very easy to understand and validate visually. You greatly reduce the risk of copy/paste errors, and when there is a bug or a new requirement, you typically only have to change one place in code instead of every line.

If you are only getting unstructured text, you have to parse it. If the text content changes you have to fix your parser. There's no way around this. You could probably find a 3rd party application to do some kind of visual parsing where you highlight the string of text you want and it does all the substring'ing for you but still unstructured text == parsing == fragile. A visual parser would at least make it easier to see mistakes/changed layouts and fix them.
As for parsing it yourself, I'm not sure about the line-by-line approach. What if something you're looking for spans multiple lines? You could bring the whole thing in a single string and use IndexOf to substring that with different indices for each piece of data you're looking for.
You could always use RegEx instead of Substring if you know how to do that.

While the basic approach your taking seems appropriate for your situation, there are definitely ways you could clean up the code to make it easier to read and maintain. By separating out the functionality that you're doing all within your main loop, you could change this:
c.Patient.FullName = cl[Int32.Parse(M.Name.Split(',')[0]) - 1].Substring(Int32.Parse(M.Name.Split(',')[1]) - 1, Int32.Parse(M.Name.Split(',')[2])).Trim();
to something like this:
var parser = new FormParser(cl, M);
c.PatientFullName = FormParser.GetName();
c.PatientAddress = FormParser.GetAddress();
// etc
So, in your new class, FormParser, you pass the List that represents your form and the claim map for the provider into the constructor. You then have a getter for each property on the form. Inside that getter, you perform your parsing/substring logic like you're doing now. Like I said, you're not really changing the method by which your doing it, but it certainly would be easier to read and maintain and might reduce your overall stress level.

Custom .Net library or existing library to handle sequence formatting?

I am thinking a library already exists for this, but I need allow my users to create a numbering format for their documents.
For example, let's say we have an RFI from and the user has a specific format the numbering sequence needs to be in. A typical RFI number looks like this for their system: R0000100. The next RFI in line would be R0000101.
Before I set out to creating a formatting engine for numbers such as these, does something already exist that can accommodate this?
Update:
I failed to save the edit to this question. Anyway, I also want to give the users the ability to create their own formats. So, I may have a form where they can input the format: R####### And also allow them to specify the starting integer: in the case 100. Also, I may want to allow them to specify how they want to increment. maybe only by 100s. So the next number may be R0000200. I know this may sound ridiculous, but you never know. That is why I asked if something like this already exists.

If you keep value and format separated, you won't need a library or such a thing.
The numbers would be simple, say, integers i, i.e. 100, 101, 102, that you manage/store however you see fit. The formatting part would simply be a matter of R + i.ToString("0000000"), or if you want to have the format as a string literal string.Format("R{0:0000000}", i).
I know, this might only be an example, but as your question stands, the formatting options, that .NET provides out of the box seem to suffice.

The incrementing of identity field values is most often handled in an RDBMS-style database. This comes with a few benefits, such as built-in concurrency handling. If you want to generate the values yourself, a simple class to get the last-issued value and increment by one would be very easy to create. Make it thread-safe so you don't get any duplicates or gaps and you'll be good to go.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.