Keep optional pipe in HL7 after parsing - c#

Original HL7
MSH|^~\&|RadImage^124|xxx|EI-ARTEFACT|xxx|123456789||ORM^O01|1234||2.3|||AL
PID|1|xxxxxx|xxxx||xxxxx^xxxxx xxxxx|xxx xxx|19391007|F|||104-430, xxx^^xxx^xx^xx^xx||(999)999-999|"||V|||||"||||||||"|N
PV1|1|A|11^11-1^^^^^2|||||123^xxx, xxx|||||||||123^xxx, xxx|||01|||||||||||||||||||NA|||||20191211082900|||||||
ORC|XO|"^"|xxx||CM||^^^xxx^^R||123456789|INTERF^INTERFACE||123^xxx, xxx|HOSPI^Hospitalisé|||KDICTE|3A^3A||"^"
OBR|1|"^"|xxx|82561^SCAN SINUS C+^^82561^SCAN SINUS C+|VU|xxx|"|"|||||"|||1234^xxx, xxx||xx|xxx|xxx|IMAGES^|xxxx||CT|"||^^^xxx^^VU||||AAAA~BBB~CCC|"^"||","~"|"|xxx|A|B|||
ZDS|1.11.11.11.1.11.1.1.11^RadImage^Application^DICOM
End result HL7
MSH|^~\&|RadImage^124|xxx|EI-ARTEFACT|xxx|123456789||ORM^O01|1234||2.3|||AL
PID|1|xxxxxx|xxxx||xxxxx^xxxxx xxxxx|xxx xxx|19391007|F|||104-430, xxx^^xxx^xx^xx^xx||(999)999-999|"||V|||||"||||||||"|N
PV1|1|A|11^11-1^^^^^2|||||123^xxx, xxx|||||||||123^xxx, xxx|||01|||||||||||||||||||NA|||||20191211082900
ORC|XO|"^"|xxx||CM||^^^xxx^^R||123456789|INTERF^INTERFACE||123^xxx, xxx|HOSPI^Hospitalisé|||KDICTE|3A^3A||"^"
OBR|1|"^"|xxx|82561^SCAN SINUS C+^^82561^SCAN SINUS C+|VU|xxx|"|"|||||"|||1234^xxx, xxx||xx|xxx|xxx|IMAGES^|xxxx||CT|"||^^^xxx^^VU||||AAAA~BBB~CCC|"^"||","~"|"|xxx|A|B|||
ZDS|1.11.11.11.1.11.1.1.11^RadImage^Application^DICOM
Hi,
I'm making a DLL in C# for parsing and modyfing a HL7 message using the nhapi Hl7 DLL.
The only thing I'm struggling to is to keep the empty pipe at the end of the PV1 segment. It'S removing the pipe in the "End result HL7" vs "Orginal HL7".
I would like to keep those pipe
This is my actual code
...
using NHapi.Base.Model;
using NHapi.Base.Parser;
using NHapi.Base.Util;
using System.Diagnostics;
using NHapi.Model.V23.Segment;
using NHapi.Model.V22.Segment;
using NHapi.Model.V21.Segment;
using NHapi.Model.V231.Segment;
...
...
public void PreAnalysis(ITratmContext ctx, MemBuf mb)
{
var parser = new PipeParser();
Debug.WriteLine(mb.ToString());
var parsedMessage = parser.Parse(mb.ToString());
var pipeDelimitedMessage = parser.Encode(parsedMessage);
Debug.WriteLine(pipeDelimitedMessage); //Message lose the empty pipe HERE
var genericMethod = parsedMessage as AbstractMessage;
// create a terser object instance by wrapping it around the message object
Terser terser = new Terser(parsedMessage);
OurTerserHelper terserHelper = new OurTerserHelper(terser);
String terserExpression = "MSH-12";
String HL7Version = terserHelper.GetData(terserExpression);
if (HL7Version == "2.3")
{
var obr = genericMethod.GetStructure("OBR") as NHapi.Model.V23.Segment.OBR;
if (obr != null)
{
for (int i = 0; i < obr.ReasonForStudyRepetitionsUsed; i++)
{
obr.GetReasonForStudy(i).Identifier.Value = StringExtention.Clean(obr.GetReasonForStudy(i).Identifier.ToString());
}
}
//var obrRep = obr.ReasonForStudyRepetitionsUsed;
Debug.WriteLine(parser.Encode(genericMethod.Message));
mb.Init(parser.Encode(genericMethod.Message));
}
}
Thank you very much !!!!

There is no need to keep any field separators after the last populated field in a segment. They are superfluous and a waste of space.

I don`t see a point in having a field separator after the last populated field. But if you insist on doing this you could you could append a custom separator at the end.

Related

Importing a File with Dynamic Columns

I am new to SSIS and C#. In SQL Server 2008 I am importing data from a .csv file. Now I have the columns dynamic. They can be around 22 columns(some times more or less). I created a staging table with 25 columns and import data into it. In essence each flat file that I import has different number of columns. They are all properly formatted only. My task is to import all the rows from a .csv flat file including the headers. I want to put this in a job so I can import multiple files into the table daily.
So inside a for each loop I have a data flow task within which I have a script component. I came up(research online) with the C# code below but I get error:
Index was outside the bounds of the array.
I tried to find the cause using MessageBox and I found it is reading the first line and the index is going outside the bounds of the array after the first line.
1.) I need your help with fixing the code
2.) My File1Conn is the flat file connection instead I want to read it directly from a variable User::FileName that my foreach loop keeps updating. Please help with modifying the code below.
Thanks in advance.
This is my flat file:
https://drive.google.com/file/d/0B418ObdiVnEIRnlsZFdwYTRfTFU/view?usp=sharing
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Windows.Forms;
using System.IO;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
private StreamReader SR;
private string File1;
public override void AcquireConnections(object Transaction)
{
// Get the connection for File1
IDTSConnectionManager100 CM = this.Connections.File1Conn;
File1 = (string)CM.AcquireConnection(null);
}
public override void PreExecute()
{
base.PreExecute();
SR = new StreamReader(File1);
}
public override void PostExecute()
{
base.PostExecute();
SR.Close();
}
public override void CreateNewOutputRows()
{
// Declare variables
string nextLine;
string[] columns;
char[] delimiters;
int Col4Count;
String[] Col4Value = new string[50];
// Set the delimiter
delimiters = ";".ToCharArray();
// Read the first line (header)
nextLine = SR.ReadLine();
// Split the line into columns
columns = nextLine.Split(delimiters);
// Find out how many Col3 there are in the file
Col4Count = columns.Length - 3;
//MessageBox.Show(Col4Count.ToString());
// Read the second line and loop until the end of the file
nextLine = SR.ReadLine();
while (nextLine != null)
{
// Split the line into columns
columns = nextLine.Split(delimiters);
{
// Add a row
File1OutputBuffer.AddRow();
// Set the values of the Script Component output according to the file content
File1OutputBuffer.SampleID = columns[0];
File1OutputBuffer.RepNumber = columns[1];
File1OutputBuffer.Product = columns[2];
File1OutputBuffer.Col1 = columns[3];
File1OutputBuffer.Col2 = columns[4];
File1OutputBuffer.Col3 = columns[5];
File1OutputBuffer.Col4 = columns[6];
File1OutputBuffer.Col5 = columns[7];
File1OutputBuffer.Col6 = columns[8];
File1OutputBuffer.Col7 = columns[9];
File1OutputBuffer.Col8 = columns[10];
File1OutputBuffer.Col9 = columns[11];
File1OutputBuffer.Col10 = columns[12];
File1OutputBuffer.Col11 = columns[13];
File1OutputBuffer.Col12 = columns[14];
File1OutputBuffer.Col13 = columns[15];
File1OutputBuffer.Col14 = columns[16];
File1OutputBuffer.Col15 = columns[17];
File1OutputBuffer.Col16 = columns[18];
}
// Read the next line
nextLine = SR.ReadLine();
}
}
}
As you mentioned the file has dynamic amount of columns, in your script component you need to count number of columns by delimiters, then redirect to different outputs.
For your 2nd question, you can assign your variable to the flat file connection manager connection string property. Then you can read the variable value in your script directly.
Except for script component, you can create a "one column" flat file source by using a dummy delimiter, then in the data flow task, you can read amount of columns into a variable, conditional split the data flow, redirect the outputs into different destinations. An example can be found at http://sqlcodespace.blogspot.com.au/2015/03/ssis-design-pattern-handling-flat-file.html

When using MergeField FieldCodes in OpenXml SDK in C# why do field codes disappear or fragment?

I have been working successfully with the C# OpenXml SDK (Unofficial Microsoft Package 2.5 from NuGet) for some time now, but have recently noticed that the following line of code returns different results depending on what mood Microsoft Word appears to be in when the file gets saved:
var fields = document.Descendants<FieldCode>();
From what I can tell, when creating the document in the first place (using Word 2013 on Windows 8.1) if you use the Insert->QuickParts->Field and choose MergeField from the Field names left hand pane, and then provide a Field name in the field properties and click OK then the field code is correctly saved in the document as I would expect.
Then when using the aforementioned line of code I will receive a field code count of 1 field. If I subsequently edit this document (and even leave this field well alone) the subsequent saving could mean that this field code no longer is returned in my query.
Another case of the same curiousness is when I see the FieldCode nodes split across multiple items. So rather than seeing say:
" MERGEFIELD Author \\* MERGEFORMAT "
As the node name, I will see:
" MERGEFIELD Aut"
"hor \\* MERGEFORMAT"
Split as two FieldCode node values. I have no idea why this would be the case, but it certainly makes my ability to match nodes that much more exciting. Is this expected behaviour? A known bug? I don't really want to have to crack open the raw xml and edit this document to work until I understand what is going on. Many thanks all.
I came across this very problem myself, and found a solution that exists within OpenXML: a utility class called MarkupSimplifier which is part of the PowerTools for Open XML project. Using this class solved all the problems I was having that you describe.
The full article is located here.
Here are some pertinent exercepts :
Perhaps the most useful simplification that this performs is to merge adjacent runs with identical formatting.
It goes on to say:
Open XML applications, including Word, can arbitrarily split runs as necessary. If you, for instance, add a comment to a document, runs will be split at the location of the start and end of the comment. After MarkupSimplifier removes comments, it can merge runs, resulting in simpler markup.
An example of the utility class in use is:
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
RemoveComments = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = false,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);
I have used this many times with Word 2010 documents using VS2015 .Net Framework 4.5.2 and it has made my life much, much easier.
Update:
I have revisited this code and have found it clears upon runs on MERGEFIELDS but not IF FIELDS that reference mergefields e.g.
{if {MERGEFIELD When39} = "Y???" "Y" "N" }
I have no idea why this might be so, and examination of the underlying XML offers no hints.
Word will often split text runs with into multiple text runs for no reason I've ever understood. When searching, comparing, tidying etc. We preprocess the body with method which combines multiple runs into a single text run.
/// <summary>
/// Combines the identical runs.
/// </summary>
/// <param name="body">The body.</param>
public static void CombineIdenticalRuns(W.Body body)
{
List<W.Run> runsToRemove = new List<W.Run>();
foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
{
List<W.Run> runs = para.Elements<W.Run>().ToList();
for (int i = runs.Count - 2; i >= 0; i--)
{
W.Text text1 = runs[i].GetFirstChild<W.Text>();
W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
if (text1 != null && text2 != null)
{
string rPr1 = "";
string rPr2 = "";
if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
if (rPr1 == rPr2)
{
text1.Text += text2.Text;
runsToRemove.Add(runs[i + 1]);
}
}
}
}
foreach (W.Run run in runsToRemove)
{
run.Remove();
}
}
I tried to simplify the document with Powertools but the result was a corrupted word file. I make this routine for simplify only fieldcodes that has specifics names, works in all parts on the docs (maindocumentpart, headers and footers):
internal static void SimplifyFieldCodes(WordprocessingDocument document)
{
var masks = new string[] { Constants.VAR_MASK, Constants.INP_MASK, Constants.TBL_MASK, Constants.IMG_MASK, Constants.GRF_MASK };
SimplifyFieldCodesInElement(document.MainDocumentPart.RootElement, masks);
foreach (var headerPart in document.MainDocumentPart.HeaderParts)
{
SimplifyFieldCodesInElement(headerPart.Header, masks);
}
foreach (var footerPart in document.MainDocumentPart.FooterParts)
{
SimplifyFieldCodesInElement(footerPart.Footer, masks);
}
}
internal static void SimplifyFieldCodesInElement(OpenXmlElement element, string[] regexpMasks)
{
foreach (var run in element.Descendants<Run>()
.Select(item => (Run)item)
.ToList())
{
var fieldChar = run.Descendants<FieldChar>().FirstOrDefault();
if (fieldChar != null && fieldChar.FieldCharType == FieldCharValues.Begin)
{
string fieldContent = "";
List<Run> runsInFieldCode = new List<Run>();
var currentRun = run.NextSibling();
while ((currentRun is Run) && currentRun.Descendants<FieldCode>().FirstOrDefault() != null)
{
var currentRunFieldCode = currentRun.Descendants<FieldCode>().FirstOrDefault();
fieldContent += currentRunFieldCode.InnerText;
runsInFieldCode.Add((Run)currentRun);
currentRun = currentRun.NextSibling();
}
// If there is more than one Run for the FieldCode, and is one we must change, set the complete text in the first Run and remove the rest
if (runsInFieldCode.Count > 1)
{
// Check fielcode to know it's one that we must simplify (for not to change TOC, PAGEREF, etc.)
bool applyTransform = false;
foreach (string regexpMask in regexpMasks)
{
Regex regex = new Regex(regexpMask);
Match match = regex.Match(fieldContent);
if (match.Success)
{
applyTransform = true;
break;
}
}
if (applyTransform)
{
var currentRunFieldCode = runsInFieldCode[0].Descendants<FieldCode>().FirstOrDefault();
currentRunFieldCode.Text = fieldContent;
runsInFieldCode.RemoveAt(0);
foreach (Run runToRemove in runsInFieldCode)
{
runToRemove.Remove();
}
}
}
}
}
}
Hope this helps!!!

Simplistic replacement of tokens in a Word Document using OpenXML SDK

I have a requirement where I would like users to type some string tokens into a Word document so that they can be replaced via a C# application with some values. So say I have a document as per the image
Now using the SDK I can read the document as follows:
private void InternalParseTags(WordprocessingDocument aDocumentToManipulate)
{
StringBuilder sbDocumentText = new StringBuilder();
using (StreamReader sr = new StreamReader(aDocumentToManipulate.MainDocumentPart.GetStream()))
{
sbDocumentText.Append(sr.ReadToEnd());
}
however as this comes back as the raw XML I cannot search for the tags easily as the underlying XML looks like:
<w:t><:</w:t></w:r><w:r w:rsidR="002E53FF" w:rsidRPr="000A794A"><w:t>Person.Meta.Age
(and obviously is not something I would have control over) instead of what I was hoping for namely:
<w:t><: Person.Meta.Age
OR
<w:t><: Person.Meta.Age
So my question is how do I actually work on the string itself namely
<: Person.Meta.Age :>
and still preserve formatting etc. so that when I have replaced the tokens with values I have:
Note: Bolding of the value of the second token value
Do I need to iterate document elements or use some other approach? All pointers greatly appreciated.
This is a bit of a thorny problem with OpenXML. The best solution I've come across is explained here:
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx
Basically Eric expands the content such that each character is in a run by itself, then looks for the run that starts a '<:' sequence and then the end sequence. Then he does the substitution and recombines all runs that have the same attributes.
The example is for PowerPoint, which is generally much less content-intensive, so performance might be a factor in Word; I expect there are ways to narrow down the scope of paragraphs or whatever you have to blow up.
For example, you can extract the text of the paragraph to see if it includes any placeholders and only do the expand/replace/condense operation on those paragraphs.
Instead of doing find/replace of tokens directly, using OpenXML, you could use some 3rd party OpenXML-based template which is trivial to use and can pays itself off soon.
As Scanny pointed out, OpenXML is full of nasty details that one has to master on on-by-one basis. The learning curve is long and steep. If you want to become OpenXML guru then go for it and start climbing. If you want to have time for some decent social life there are other alternatives: just pick one third party toolkit that is based on OpenXML. I've evaluated Docentric Toolkit. It offers template based approach, where you prepare a template, which is a file in Word format, which contains placeholders for data that gets merged from the application at runtime. They all support any formatting that MS Word supports, you can use conditional content, tables, etc.
You can also create or change a document using DOM approach. Final document can be .docx or .pdf.
Docentric is licensed product, but you will soon compensate the cost by the time you will save using one of these tools.
If you will be running your application on a server, don't use interop - see this link for more details: (http://support2.microsoft.com/kb/257757).
Here is some code I slapped together pretty quickly to account for tokens spread across runs in the xml. I don't know the library much, but was able to get this to work. This could use some performance enhancements too because of all the looping.
/// <summary>
/// Iterates through texts, concatenates them and looks for tokens to replace
/// </summary>
/// <param name="texts"></param>
/// <param name="tokenNameValuePairs"></param>
/// <returns>T/F whether a token was replaced. Should loop this call until it returns false.</returns>
private bool IterateTextsAndTokenReplace(IEnumerable<Text> texts, IDictionary<string, object> tokenNameValuePairs)
{
List<Text> tokenRuns = new List<Text>();
string runAggregate = String.Empty;
bool replacedAToken = false;
foreach (var run in texts)
{
if (run.Text.Contains(prefixTokenString) || runAggregate.Contains(prefixTokenString))
{
runAggregate += run.Text;
tokenRuns.Add(run);
if (run.Text.Contains(suffixTokenString))
{
if (possibleTokenRegex.IsMatch(runAggregate))
{
string possibleToken = possibleTokenRegex.Match(runAggregate).Value;
string innerToken = possibleToken.Replace(prefixTokenString, String.Empty).Replace(suffixTokenString, String.Empty);
if (tokenNameValuePairs.ContainsKey(innerToken))
{
//found token!!!
string replacementText = runAggregate.Replace(prefixTokenString + innerToken + suffixTokenString, Convert.ToString(tokenNameValuePairs[innerToken]));
Text newRun = new Text(replacementText);
run.InsertAfterSelf(newRun);
foreach (Text runToDelete in tokenRuns)
{
runToDelete.Remove();
}
replacedAToken = true;
}
}
runAggregate = String.Empty;
tokenRuns.Clear();
}
}
}
return replacedAToken;
}
string prefixTokenString = "{";
string suffixTokenString = "}";
Regex possibleTokenRegex = new Regex(prefixTokenString + "[a-zA-Z0-9-_]+" + suffixTokenString);
And some samples of calling the function:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memoryStream, true))
{
bool replacedAToken = true;
//continue to loop document until token's have not bee replaced. This is because some tokens are spread across 'runs' and may need a second iteration of processing to catch them.
while (replacedAToken)
{
//get all the text elements
IEnumerable<Text> texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>();
replacedAToken = this.IterateTextsAndTokenReplace(texts, tokenNameValuePairs);
}
wordDoc.MainDocumentPart.Document.Save();
foreach (FooterPart footerPart in wordDoc.MainDocumentPart.FooterParts)
{
if (footerPart != null)
{
Footer footer = footerPart.Footer;
if (footer != null)
{
replacedAToken = true;
while (replacedAToken)
{
IEnumerable<Text> footerTexts = footer.Descendants<Text>();
replacedAToken = this.IterateTextsAndTokenReplace(footerTexts, tokenNameValuePairs);
}
footer.Save();
}
}
}
foreach (HeaderPart headerPart in wordDoc.MainDocumentPart.HeaderParts)
{
if (headerPart != null)
{
Header header = headerPart.Header;
if (header != null)
{
replacedAToken = true;
while (replacedAToken)
{
IEnumerable<Text> headerTexts = header.Descendants<Text>();
replacedAToken = this.IterateTextsAndTokenReplace(headerTexts, tokenNameValuePairs);
}
header.Save();
}
}
}
}

StreamReader get string between certain characters

I have a program that sends emails utilizing templates via a web service. To test the templates, I made a simple program that reads the templates, fills it up with dummy value and send it. The problem is that the templates have different 'fill in' variable names. So what I want to do is open the template, make a list of the variables and then fill them with dummy text.
Right no I have something like:
StreamReader SR = new StreamReader(myPath);
.... //Email code here
Msg.Body = SR.ReadToEnd();
SR.Close();
Msg.Body = Msg.Body.Replace(%myFillInVariable%, "Test String");
....
So I'm thinking, opening the template, search for values in between "%" and put them in an ArrayList, then do the Msg.Body = SR.ReadToEnd(); part. Loop the ArrayList and do the Replace part using the value of the Array.
What I can't find is how to read the value between the % tags. Any suggestions on what method to use will be greatly appreciated.
Thanks,
MORE DETAILS:
Sorry if I wasn't clear. I'm passing the name of the TEMPLATE to the script from a drop down. I might have a few dozen Templates and they all have different %VariableToBeReplace%. So that's is why I want to read the Template with the StreamReader, find all the %value names%, put them into an array AND THEN fill them up - which I already know how to do. It's getting the the name of what I need to replace in code which I don't know what to do.
I am not sure on your question either but here is a sample of how to do the replacement.
You can run and play with this example in LinqPad.
Copy this content into a file and change the path to what you want. Content:
Hello %FirstName% %LastName%,
We would like to welcome you and your family to our program at the low cost of %currentprice%. We are glad to offer you this %Service%
Thanks,
Some Person
Code:
var content = string.Empty;
using(var streamReader = new StreamReader(#"C:\EmailTemplate.txt"))
{
content = streamReader.ReadToEnd();
}
var matches = Regex.Matches(content, #"%(.*?)%", RegexOptions.ExplicitCapture);
var extractedReplacementVariables = new List<string>(matches.Count);
foreach(Match match in matches)
{
extractedReplacementVariables.Add(match.Value);
}
extractedReplacementVariables.Dump("Extracted KeyReplacements");
//Do your code here to populate these, this part is just to show it still works
//Modify to meet your needs
var replacementsWithValues = new Dictionary<string, string>(extractedReplacementVariables.Count);
for(var i = 0; i < extractedReplacementVariables.Count; i++)
{
replacementsWithValues.Add(extractedReplacementVariables[i], "TestValue" + i);
}
content.Dump("Template before Variable Replacement");
foreach(var key in replacementsWithValues.Keys)
{
content = content.Replace(key, replacementsWithValues[key]);
}
content.Dump("Template After Variable Replacement");
Result from LinqPad:
I am not really sure that I understood your question but, you can try to put on the first line of the template your 'fill in variable'.
Something like:
StreamReader SR = new StreamReader(myPath);
String fill_in_var=SR.ReadLine();
String line;
while((line = SR.ReadLine()) != null)
{
Msg.Body+=line;
}
SR.Close();
Msg.Body = Msg.Body.Replace(fill_in_var, "Test String");

How to load quickdic dictionary into C#

I have downloaded a dictionary file from http://code.google.com/p/quickdic-dictionary/
But the file extension is .quickdic and is not plain text.
How can I load the quickdic dictionaries (.quickdic) into c# to make simple word queries?
I browsed through the git code, and a few things stuck out.
First, in the DictionaryActivity.java file, there is the following in onCreate():
final String name = application.getDictionaryName(dictFile.getName());
this.setTitle("QuickDic: " + name);
dictRaf = new RandomAccessFile(dictFile, "r");
dictionary = new Dictionary(dictRaf);
That Dictionary Class is not the built in class with Java, but is here according to the imports:
import com.hughes.android.dictionary.engine.Dictionary;
When I look there, it shows a constructor for a Dictionary taking a RandomAccessFile as the parameter. Here's that source code:
public Dictionary(final RandomAccessFile raf) throws IOException {
dictFileVersion = raf.readInt();
if (dictFileVersion < 0 || dictFileVersion > CURRENT_DICT_VERSION) {
throw new IOException("Invalid dictionary version: " + dictFileVersion);
}
creationMillis = raf.readLong();
dictInfo = raf.readUTF();
// Load the sources, then seek past them, because reading them later disrupts the offset.
try {
final RAFList<EntrySource> rafSources = RAFList.create(raf, new EntrySource.Serializer(this), raf.getFilePointer());
sources = new ArrayList<EntrySource>(rafSources);
raf.seek(rafSources.getEndOffset());
pairEntries = CachingList.create(RAFList.create(raf, new PairEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
textEntries = CachingList.create(RAFList.create(raf, new TextEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
if (dictFileVersion >= 5) {
htmlEntries = CachingList.create(RAFList.create(raf, new HtmlEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
} else {
htmlEntries = Collections.emptyList();
}
indices = CachingList.createFullyCached(RAFList.create(raf, indexSerializer, raf.getFilePointer()));
} catch (RuntimeException e) {
final IOException ioe = new IOException("RuntimeException loading dictionary");
ioe.initCause(e);
throw ioe;
}
final String end = raf.readUTF();
if (!end.equals(END_OF_DICTIONARY)) {
throw new IOException("Dictionary seems corrupt: " + end);
}
So, anyway, this is how his java code reads the file in.
Hopefully, this helps you simulate this in C#.
From here you would probably want to see how he is serializing the EntrySource, PairEntry, TextEntry, and HtmlEntry, as well as the indexSerializer.
Next look to see how RAFList.create() works.
Then see how that result is incorporated in creating a CachingList using CachingList.create()
Disclaimer: I'm not sure if the built in serializer in C# uses the same format as Java's, so you may need to simulate that too :)

Categories