OnBase & C# script - c#

I recently worked on a project to import Outlook Emails into OnBase Document Management System. Now I am in the process of enhancing this project.
When we receive an email, in the subject line, it contains numbers. I want to grab those numbers. So lets say if subject line contains:
"My name is Hiren and my Driver license# 123456".
I want to pull that sub-string 123456 to populate a "Driver License" keyword box, in OnBase. The length of the numbers is 6.
How can I do that?

This really has nothing to do with OnBase or any other integration. You simply need to know how to extract a number from a string. Where you store it is irrelevant. A simple way to do it would be using a regular expression:
var s = "My name is Hiren and my Driver license# 123456";
Regex r = new Regex(#"\d+");
foreach (var match in r.Matches(s))
Console.WriteLine(match);

You can do it in two ways: Either using RegEx or Looking at the last 6 characters of the subject line if you can guarantee the structure.
Here is an example:
http://dotnetpad.com/lB2pmbs7

If you are using mailBox importer, then is very easy, there is a keyword named "mail subject" that store the subjet of the email. Then with VBScript you can get the numbers.
This could help you:
Create an action to create an expression, then type this VBScript:
right(%K00122;len(%K00122) - InStr(%K00122;"#"))
Replace the "%K00122" wiht the number that represents the key word tha have the subject.

Related

Replacing words in a Word document cause multiple times replacement with C#

I need to create a C#.NET program which will search specific words in a Microsoft Word document and will replace it with another words. For example, in my word file there is a text which is – LeadSoft IT. This “LeadSoft IT” will be replaced by – LeadSoft IT Limited. Now there is a problem which is, at the first time LeadSoft IT will be replaced with LeadSoft IT Limited. But if I run the program again then it will change LeadSoft IT again and in the next time the text will be LeadSoft IT Limited Limited. This is a problem. Can anyone suggest me how to solve this problem with C# code to replace words in word document.
If you already have some script for this, feel free to post it and I'll try and help more.
I'm not sure what functionality you're using to find the text instance, but I would suggest looking into regex, and using something like (LeadSoft IT(?! Limited)).
Regex: https://regexr.com/
A good regex tester: https://www.regextester.com/109925
Edit: I made a Python script that uses regex to replace the instances:
import re
word_doc = "We like working " \
"here at Leadsoft IT.\n" \
"We are not limited here at " \
"Leadsoft It Limited."
replace_str = "Leadsoft IT Limited"
reg_str = '(Leadsoft IT(?!.?Limited))'
fixed_str = re.sub(reg_str, replace_str, word_doc, flags=re.IGNORECASE)
print(fixed_str)
# Prints:
# We like working here at Leadsoft IT Limited.
# We are not limited here at Leadsoft It Limited.
Edit 2: Code re-created in C#: https://gist.github.com/Zylvian/47ecd6d1953b8d8c3900dc30645efe98
The regex checks the entire string for instances where Leadsoft IT is NOT followed by Limited, and for all those instances, replaces Leadsoft IT with Leadsoft IT Limited.
The regex uses what's called a "negative lookahead (?!)" which makes sure that the string to the left is not followed by the string to the right. Feel free to edit the regex how you see fit, but be aware that the matching is very strong.
If you want to understand the regex string better, feel free to copy it into https://www.regextester.com/.
Let me know if that helps!
Simplistically, you can just run another replace to fix the problem you cause:
s = s.Replace("LeadSoft IT", "LeadSoft IT Limited").Replace("LeadSoft IT Limited Limited", "LeadSoft IT Limited");
If you're after a more generic fixing of this that doesn't hard code the problem string, consider examining whether the string you find is inside the string you replace with, which will mean the problem occurs. This means you need to run a second replacement on the document that finds the result of running the replacement on the replacement
var find = "LeadSoft IT";
var repl = "LeadSoft IT Limited";
var result = document.Replace(find, repl);
var problemWillOccur = repl.Contains(find);
if(problemWillOccur){
var fixProblemByFinding = repl.Replace(find, repl); //is "LeadSoft IT Limited Limited"
result = result.Replace(fixProblemByFinding, repl);
}
You may be interested how I solve this problem.
At first, I was using NPOI but it was making a mess with document, so I discovered that a DOCX file is simply a ZIP Archive with XMLs.
https://github.com/kubala156/DociFlow/blob/main/DociFlow.Lib/Word/SeekAndReplace.cs
Usage:
var vars = Dictionary<string, string>()
{
{ "testtag", "Test tag value" }
}
using (var doci = new DociFlow.Lib.Word.SeekAndReplace())
{
// test.docx contains text with tag "{{testtag}}" it will be replaced with "Test tag value"
doci.Open("test.docx");
doci.FindAndReplace(vars, "{{", "}}");
}
NPOI 2.5.4 provides ReplaceText method to help you replace placeholders in a Word file.
Here is an example.
https://github.com/nissl-lab/npoi-examples/blob/main/xwpf/ReplaceTexts/Program.cs

Reverse RegExp from user entered string ( C#)

Is it possible to generate regular expressions from a user entered string? Are there any C# libraries to do this?
For example a user enters a string e.g. ABCxyz123 and the C# code automatically generates [A-Z]{3}[a-z]{3}\d{3}.
This is a simple string but we could have more complicated strings like
MON-0123/AB/5678-abc 2/7
Or
1234-678/abc::1234ABC?246
I already have a string tokeniser (from a previous stackoverflow question) so I could construct a regex from the list of tokens.
But I was wondering if there is a lib or C# code out there that’ll do it.
Edit: Important, I should of also said: It's not the actual character in the string that are important but the type of character and how many.
e.g A user could enter a "pattern" string of ABCxyz123.
This would be interpreted as
3 upper case alphas followed by
3 lower case alphas followed by
3 digits
So other users (when complied) must enter strings that match that pattern [A-Z]{3}[a-z]{3}\d{3}., e.g. QAZplm789
It's the format of user entered strings that's need to be checked not the actual content if that makes sense
Jerry has a related link
creating a regular expression for a list of strings
There are a few other links off this.
I'm not trying to do anything complicated e.g NLP etc.
I could use C# expression builder and dynamic linq at a push, but that seems overkill and a code maintainable nightmare .
I'll write my own "simple" regex builder from the tokenized string.
Example Use Case:
An admin office user where I work could setup the string patterns for each field by typing a string pattern, My code converts this to a regex, I store these in a database.
E.g: Field one requires 3 digits at the start. If there are 2 digits then send to workflow 1 if 3 then send to workflow 2. I could simply check the number of chars by substr or what ever. But this would be a concrete solution.
I am trying to do this generically for multiple documents with multiple fields. Also, each field could have multiple format checkers.
I don't want to write specific C# checks for every single field in numerous documents.
I'll get on with it, should keep me amused for a couple of days.

Dynamic Regex for number range using c#

I'm looking at UK postcodes and trying to work out how I can take data from a database (the first part of a UK postcode) and dynamically create a regexp for them using c#. For example:
AB44-56
I know what I want as an output:
AB([4][4-9]|[5][0-6])+
However, I can't work out how I might be able to do this with logic, perhaps I need to split the Letters from the numbers first, but i can't do that using split.
I have other combinations too - single range:
AB31 would be AB[3][1]+
Some with just letters:
BT would be BT+
Some with a single letter and 1 or two numbers:
G83 Would be G[8][3]
Any suggestions or guidance would be very much appriciated how this may be coded.
afrom wikipedia UK postal codes :
This can be generalised as: (one or two letters)(number between 0 and
99)(zero or one letter)(space)(single digit)(two letters)
so
^[A-Z,a-z]{0,2}\d+[A-Z,a-z]?\s\d[A-Z,a-z]{2}$
might work.
EDIT: Also if you are trying to restric the postal codes to say those with the same prefix as the ones in the database you could do this.
var source = "BTasdfweasdf"; //from the database
var input = "BT1A 1BB"; //from the somewhere else
var regex = Regex.Replace(source, #"(^[A-z,a-z]{0,2})(.*)", #"$1\d+[A-Z,a-z]?\s\d[A-Z,a-z]{2}$");
var match = Regex.Match(input,regex);

Proximity Search example Lucene.Net

I want to make a Proximity Search with Lucene.Net. I saw this question where it looks like that was the answer for him, but no code was suplied. The Java documentation says to use the ~ character with the number of words in between, but I don't see where this character would go in the code. Anyone can give me an example of a Proximity Search using Lucene.Net?
Edit:
What I have so far:
IndexSearcher searcher = new IndexSearcher(this.Directory, true);
string[] fieldList = new string[] { "Name", "Description" };
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
{
occurs.Add(BooleanClause.Occur.SHOULD);
}
Query searchQuery = MultiFieldQueryParser.Parse(this.LuceneVersion, query, fieldList, occurs.ToArray(), this.Analyzer);
If I try to add the "~" with any number on the MultiFieldQueryParser it errors out saying that for a FuzzySearch the values should be between 0.0 and 1.0, but I want a Proximity Search 3 words of separation Ex. "my search"~3
The tilde means either a fuzzy search if you apply it on a single term, or a proximity search if you apply it on a phrase. The error you're receiving sounds like you're applying it on a single term (term~10) instead of using a phrase ("term term"~10).
To do a proximity search use the tilde, "~", symbol at the end of a Phrase.
The only differences between Lucene.NET and classic java lucene of the same version should be internal, not external -- operational goal is to have a very compatible project, especially on the input (queries) and output (index files) side. So it should work however it works for java lucene. If it don't, it is a bug.

C# Need to locate web addresses using REGEX is that possible?

C# Need to locate web addresses using REGEX is that possible?
Basically I need to parse a string prior to loading it into a WebBrowser
myString = "this is an example string http://www.google.com , and I need to make the link clickable";
webBrow.DocumentText = myString;
Basically what I want to happen is a replace of the web address so that it looks like a hyperlink, and do this with any address pulled in to the string. I would need to replace the web address so that web address would read like
<a href='web address'>web address</a>
This would allow me to have the links clickable..
Any Ideas?
new Regex(#"https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?").Match(myString)
It's possible depending on how strict or permissive you want your parsing to be.
As a first cut, you can try #"\bhttp://\S+" which will match any string starting with "http://" at a word boundary (non-word character, such as whitespace or punctuation).
To search using a regex and replace all occurrences with your custom text, you could use the Regex.Replace method.
You may want to read up on Regular Expression Language Elements to learn more.

Categories