Regex and Excel Cells - c#

There is a .NET opensource library called NPOI that allows you to manipulate Excel files. Unfortunately, their ShiftRows function does not adjust any cell references in formulas.
Therefore, I need to create a regex pattern to update them. Take for example a cell containing the following formula:
=(B7/C9) * (A10-B4)
I would like to bump any row references by 1 thus becoming
=(B8/C10) * (A11-B5)
Basically, I just need a pattern that will extract the numbers out into a "MatchCollection". I can do the rest.
Can anyone help?
Thanks.

Take a look at my answer to another question: Which regular expression is able to select excel column names in a formula in C#?
In that answer I match formulas and include code to increment the cell number (and column naming). Also, do a search on Excel column naming here and you'll find other ways to get the names generated or incremented. I probably could shorten the IncrementColumn MatchEvaluator's code using one of those methods but that's what I came up with at the time.

You should be very careful using regular expressions for this purpose.
Excel formulas can become very complex, especially with user-defined functions or when functions apply to ranges.
NPOI contains a Formula evaluation library and I think you should have a look at this as well - especially if the spreadsheets are out of your control.

Related

Find Critical Chi Square Value using MathNet.Numerics

So I want to get Critical Chi-Square Value using Significance level and Degrees of Freedom. I tried using MathNet.Numerics but couldn't find which method to use to get the Critical Chi-Square Value
This was the documentation I'm referring, any help on redirecting me to correct documentation would help.
How I calculate the value in Excel is by using the formula =CHISQ.INV.RT(A2,B2)
The function you require is InvCDF(), it is used as follows:
MathNet.Numerics.Distributions.ChiSquared.InvCDF(degreesOfFreedom, probability);
I could finally solve this problem, so I want to share how I solved it.
I used the MathNet library, and to use the same function of Excel you are providing you have to keep in mind a few things: in this library it does not exist =CHISQ.INV.RT itself, instead, in C#, you need to use InvCDF (the equivalent of =CHISQ.INV in Excel) but instead of using a probability parameter like 0.05, you have to use the opposite part of the interval (0, 1), so that parameter should be 0.95.
The logic of this is in the description of the functions in Excel.
"CHISQ.INV" description says "Returns the inverse of the left-tailed probability of the chi-squared distribution", this one is the equivalent of ChiSquared.InvCDF (C#).
"CHISQ.INV.RT" description says "Returns the inverse of the right-tailed probability of the chi-squared distribution", this one DOES NOT exist in the MathNet library.
Example:
In Excel you write
=CHISQ.INV.RT(0.05, 9)
In C# you write
ChiSquared.InvCDF(9, 0.95);
In both cases the answer will be 16.9189776
Note that the order of the parameters are switched.
I hope I could help with this.

MS Word C# AddIn - how to edit xml of an open word document

Thanks for coming by :)
I need to modify the XML of an MS Word Document directly, because the Word Interop's capabilities are insufficient for what I need to do.
The trick is that I have to do it from a Word Add-In and apply it to the currently open document, so I can't open/save packages (right?). In short, several dozen articles like the one below are not applicable here:
https://msdn.microsoft.com/en-us/library/aa982683%28v=office.12%29.aspx
Any help would be appreciated :)
Example problem -- Remove custom cell margins from a really, really big table in word (think 200x10) and check "Same as whole table" for each.
A lead on a solution (currenttable is the currently selected word table):
using System.Xml.Linq; // plus all the standard Word Add-In references
...
XDocument currentablexdocument = XDocument.Parse(currenttable.Range.WordOpenXML);
currentablexdocument.Descendants().Where(e =>e.Name.LocalName.Equals("tcMar")).Remove();
currenttable.Range.Delete();
currentselection.InsertXML(currentablexdocument.ToString());
Explanation:
currenttable.Range.WordOpenXML provides me with well-formed XML representation of the table, which I then interpret as an XDocument
tcMar = table cell margins. These XML elements exist only if a cell has custom margins. Deleting all such elements does exactly what I need.
currenttable.Range.Delete() deletes the old table
currentselection.InsertXML(...) inserts the modified table XML into the document with margins fixed. Pretty much instantaneous. Yay!
Problem:
Deleting and inserting the table is flaky and yields undesired results. It would be much better if I could MODIFY the xml directly. Is it possible?
Disclaimer:
Any other ideas of fixing this particular issue are welcome, but I have tried a myriad of possible solutions:
applying table style rejected by client,
looping "SendKeys" commands to automate use of the Word interface too unreliable,
changing Table.XXXPadding, Row.XXXPadding, Column.XXXPadding doesn't affect custom Cell margins (among other issues)
looping through cells to change their Cell.XXXPadding too slow (Freezes word for several minutes on a 200x10 table). Note, it's accessing the padding that's slow; the loop itself takes 3 seconds to traverse the whole table when implemented correctly.
ofc I tried it all with ScreenRefreshing = false and AllowAutoFit = false;
Somebody please help :)
Cheers!

Indirect code execution/C#

Background
We currently have an excel-based system for the creation of specifications for a sound and lighting rental company.
Part of this is a column in the excel sheet called 'autospec', which is made up of Excel formulas for individual stock items ( e.g., you specify a loudspeaker, and then the formulas in the autospec column calculate the cables you need and add them onto the specification automatically.
sample formula
10m microphone cable might have a formula like the following:
=IF([loudspeakers]>0,[loudspeakers]*2,0)+([mixing desk]*4)
We're now moving over to a proper database with a C# front end.
Question
What I would like is to be able to store the autospec formulas for each stock item in a table, and when the user specs an item the front end, the program should find the relevant formula, execute it, and change the spec quantity as appropriate.
Bottom line: I need to execute code contained within a string.
Am I going about this the wrong way?
Is there a better way?
I asked a similar question once: How can I evaluate a C# expression dynamically?
You could probably use that to evaluate those expressions. This not really a safe thing to do, unless you can guarantee nobody will be adding junk (read: evil code) to your database.
If you can break down the formulas into "families", such that each entry in your formula column is a member of a small (5-10) set of formulas with just different parameters, you could try something like this:
[ItemTable]<-[ItemFormulaParameters(param1, param2, param3)]->[FormulaTable(name)]
And have a factory method for instantiating the formula objects by name. Each such formula object has a "calculate(param1, param2, param3)" property...
Either build you're own code interprenter or build a 'dynamic rule engine' that creates some binary representation of c# rules that can be executed by a 'rule engine' you'd have to write.

how to recognize similar words with difference in spelling

I want to filter out duplicate customer names from a database. A single customer may have more than one entry to the system with the same name but with little difference in spelling. So here is an example: A customer named Brook may have three entries to the system
with this variations:
Brook Berta
Bruck Berta
Biruk Berta
Let's assume we are putting this name in one database column.
I would like to know the different mechanisms to identify such duplications form say a 100,000 records. We may use regular expressions in C# to iterate through all records or some other pattern matching technique or we may export these records to what ever best fits for such queries (SQL with Regular Expression capabilities)).
This is what I thought as a solution
Write a C# code to iterate through each record
Get only the Consonant letters in order (in the above case: BrKBrt)
Search for the same Consonant pattern from the other records considering
similar sounding letters like (C,K) (C,S), (F, PH)
So please forward any ideas.
The Double Metaphone algorithm, published in 2000, is a new and improved version of the Soundex algorithm that was patented in 1918.
The article has links to Double Metaphone implementations in many languages.
Have a look at Soundex
There is a Soundex function in Transact-SQL (see http://msdn.microsoft.com/en-us/library/ms187384.aspx):
SELECT
SOUNDEX('brook berta'),
SOUNDEX('Bruck Berta'),
SOUNDEX('Biruk Berta')
returns the same value B620 for each of the example values
The obvious, established (and well documented) algorithms for finding string similarity are:
Levenstein distance
Soundex
I would consider writing something such as the "famous" python spell checker.
http://norvig.com/spell-correct.html
This will take a word and find all possible alternatives based on missing letters, adding letters, swapping letters, etc.
You might want to google for phonetic similarity algorithm and you'll find plenty of information about this. Including this article on Codeproject about implementing a solution in C#.
Look into soundex. It's a pretty standard library in most languages that does what you require, i.e. algorithmically identify phonetic similarity.
http://en.wikipedia.org/wiki/Soundex
There is a very nice R (just search for "R" in Google) package for Record Linkage. The standard examples target exactly your problem: R RecordLinkage
The C-Code for Soundex etc. is taken directly from PostgreSQL!
I would recommend Soundex and derived algorithms over Lev distance for this solution. Levenstein distance more appropriate for spell checking solutions imho.

Can I set auto-width on an Open XML SDK-generated spreadsheet without calculating the individual widths?

I'm working on creating an Excel file from a large set of data by using the Open XML SDK. I've finally managed to get a functional Columns node, which specifies all of the columns which will actually be used in the file. There is a "BestFit" property that can be set to true, but this apparently does not do anything. Is there a way to automatically set these columns to "best fit", so that when someone opens this file, they're already sized to the correct amount? Or am I forced to calculate how wide each column should be in advance, and set this in the code?
The way I understand the spec and this MSDN discussion, BestFit tells you the width was auto-calculated in Excel, but it does not tell Excel that it should calculate it again next time it is opened.
As "goodol" indicates in that discussion, I think the width can only be calculated when you display the column, since it depends on the contents, the font used, other style parameters... So even if you want to pre-calculate the width yourself, be aware that this is only an estimation, and it can be wrong if the contents contain lots of "wide" characters. Or does the Open XML SDK do this for you?
I'm using EPPlus which I highly recommend. Took me a while to figure out how to do it using that, here's what I came up with:
// Get your worksheet in "sheet" variable
// Set columns to auto-fit
for (int i = 1; i <= sheet.Dimension.Columns; i++)
{
sheet.Column(i).AutoFit();
}

Categories