Detecting a PO Box (address) in a C# string

Detecting a PO Box (address) in a C# string - c#

I am working on a project where the users have to put in the physical address of their organization, in many cases users will put in a PO Box rather than their physical address. I need a way in C# to determine whether or not a user put in a P.O. Box or PO Box (or any other variation of this) rather 29 Maple Street style address. I have had a few thoughts, but I thought I would get some really great feedback here.
Thanks

I would try to parse the address as a string. Then find a 'P.O. Box' or 'PO Box' in the array: if it finds it, the PO BOX should be the next element(s).
You will also need a way to detect the city so you know when to stop. You could use google's geonames (http://www.geonames.org/) as a data base.

Related

Weighing Scale Barcode Reader C#

how can I convert an EAN-13 or Code-128 barcode generated from a weighing scale machine back to a user-defined object?
Class Product
{
string name = "Apples";
deciaml qty = "0.5";
double value = "5";
}
I already found libraries but all decodes barcode provided as an image. in my case, I have a barcode reader which will be reading the barcode label and input it as numbers something like (2032156478954).
what library or how can I decode those barcode numbers back to my object?
assume that I know from the user manual of the weighing scale which part is the product name, qty, and value
just like those label barcode we see in hypermarkets where you buy fruits and veggies in KG or Gram, it prints a barcode label, then the barcode label on POS is converted back to product object.
I am totally new when it comes to handling barcodes in .NET, any help, suggestion, or advice will be appreciated.
Example of Weighing Scale Barcode

Currently, I have solved it by implementing my own solution.
assume the barcode is 2 53647 2 5262 9 (EAN-13)
from the left-hand side, 2 tells the POS this is a barcode from the weighing scale machine, 53647 will be the ID of the item in the database.
2 tells the POS next 5 digits are the price of the item (52.62)
the last digit always discarded
the downside of this method is you will have to change either the settings of your weighing machine for every new setup you make to match your function. or you will change your code to match how the machine is printing barcodes since there is no one international standard.
I was hoping for a library that would have built-in functionality to recognize and decode those barcodes based on leading numbers and other check numbers. I might start building my own after looking at the most used formats.

If you already have a the string, as others pointed out you theoretically just nid to split the barcode and fill you class.
If you have a look here:
https://www.codeproject.com/Articles/10162/Creating-EAN-13-Barcodes-with-C
It shows you what the single numbers mean.
However:
If you want to figure out the values behind the numbers, then thats a little bit tricky. I expect the manufacturer code, if internationally standardized is something that will change over time. Because someone registers a new manufacturer and therefore gets a new code.
This would imply your programm needs access to the internet resp. to this database where they are registered.
Before putting to much effort in it, ask you self:
Do I really need to have this informations that well prepared, for the project am I'm doing or would it be completely fine if you just split the strings and have as manufacturer for example "50603" without knowing whats behind.
I just give you this sample for the EAN code, but I would say you can apply this to other codes as well.

c# auto processing email according to rules

I need a pattern for processing incoming emails.
My current pseudo-code is like this:
if sender is a#a.com and messageBody contains "aaa" then
extract the content according the aaa function
save it to database
move the message to the archive
else if messageBody contains "bbb" then
extract the content according to bbb function
save it to database
inform sender
move the message to archive
else if messageBody NOT contains "ccc" and from "sender#ccc.com" and then
leave message in the inbox so the it will be manually processed
else if ...
...
So, I ended up with a pig function with thousands of lines.
How can I make this thing simpler?
Thanks in advance

A very good architecture is required for solving this problem, machine learning is one of the best solution for this kind of problems. But still there are some things that you can take care of, in order to make it simpler.
Rather than putting 10 ifs for 10 email-ids, create a list of unwanted users (which will go in spam)
Create a list of unwanted subjects
Create groups of time intervals and process emails accordingly; morning emails, noon emails, evening emails etc
create has attachment check
no subject check
no body check
friends email list
same domain email-id sender check
Thanks. :)

Google Maps- Plotting markers will only work in some cities

I am building a travel organiser application in ASP.NET / C#.
At the moment, the user types in their destination, and my application sends the latitude and longitude to the Google Places API, which returns a list of hotels in the destination city.
The application then plots markers on Google Map (v3) for the hotels, but strangely only for some (small) cities. If I try a major city, or even a large town, the map just won't appear at all.
If 20 results are returned for hotels in Reykjavik, the hotels will be shown without a problem. If 20 results are returned for Dublin, Paris, or Glasgow.. (I think you get the picture!), the map won't show.
I have noticed that hotels in these small cities seem to be in a fairly concentrated area, so I have tried zooming out for larger cities, but that still won't work.
Does anybody have any idea why this would be?
Many thanks.

I found the solution to this problem.
The issue was that I was not escaping apostrophe characters when I was reading in hotel and bar names from the Yelp API.
The reason that some cities were displaying and others were not is down to the general language used in that particular locale. Places like Dublin and Paris tend to have a higher number of instances of businesses with an apostrophe in their name (eg. "O'Haras, O'Reillys, L'Entre Potes, etc..), than say Reykjavik or Oslo, which was causing the map script to crash only in certain cities.
For those who didn't know, like me, you can escape apostophes using a backslash.
alert('O\'Neils Bar, Dublin');

Efficient algorithm for finding related submissions

I recently launched my humble side project and would like to add a "related submissions" section when viewing a submission. Exactly like what SO is doing here - see right column, titled "Related"
Considering that each submission has a title and a set of tags, what is most effective (optimum result), most efficient (fast, memory friendly) way to query the database for related submissions?
I can think of one way to do this (which I'll post as an answer) but I'm very interested to see what others have to say. Or perhaps there's already a standard way of achieving this?

Here's my two cent solution:
To achieve the best output, we need to put “weight” on the query results.
To start with, each submission in the database is assumed to have a weight of zero.
Then, if a submission in the "pool" shares one tag with the current submission, we'd add +3 to the found submission. Hence, if another submission is found that shares two tags with the current submission, we add +6 to the weight.
Next, we split/tokenize the title of the current submission and remove “stop words”.
I’ve seen a list of stop words from google, but for now I’ll define my stop words to be: [“of”, “a”, “the”, “in”]
Example:
Title “The Best Submission of All Times”
Result the array: ["The", “Best”, “Submission”, “of”, “All”, “Times”]
Remove stop words: [“Best”, “Submission”, “All”, “Times”]
Then we query the database for submissions containing any of the mentioned titles, and for each result we add the weight: +2
And finally sort the list descending by weight and take the top N results.
What do you think? (be gentle!)

If I understand well, you need a technique to find whether two posts are "similar" one to each other. You may want to use a probabilistic model for that:
http://en.wikipedia.org/wiki/Mutual_information
The idea would be to say that if two posts share a lot of "uncommon" words, they are probably speaking on the same topic. For detecting uncommon words, depending on your application, you may use a general table of frequencies, or maybe better, build it yourself on the universe of the words of your posts (but you will need to have enough of them to have something relevant).
I would not limit myself on title and tags, but I would overweight them in the research.
This kind of ideas is very common in spam filtering. I unfortunately the time to make a full review, but a quick google search gives:
http://www.aclweb.org/anthology/P/P04/P04-3024.pdf
karlmicha.googlepages.com/acl2004_poster.pdf

Address Match Key Algorithm

I have a list of addresses in two separate tables that are slightly off that I need to be able to match. For example, the same address can be entered in multiple ways:
110 Test St
110 Test St.
110 Test Street
Although simple, you can imagine the situation in more complex scenerios. I am trying to develop a simple algorithm that will be able to match the above addresses as a key.
For example. the key might be "11TEST" - first two of 110, first two of Test and first two of street variant. A full match key would also include first 5 of the zipcode as well so in the above example, the full key might look like "11TEST44680".
I am looking for ideas for an effective algorithm or resources I can look at for considerations when developing this. Any ideas can be pseudo code or in your language of choice.
We are only concerned with US addresses. In fact, we are only looking at addresses from 250 zip codes from Ohio and Michigan. We also do not have access to any postal software although would be open to ideas for cost effective solutions (it would essentially be a one time use). Please be mindful that this is an initial dump of data from a government source so suggestions of how users can clean it are helpful as I build out the application but I would love to have the best initial I possibly can by being able to match addresses as best as possible.

I'm working on a similar algorithm as we speak, it should handle addresses in Canada, USA, Mexico and the UK by the time I'm done. The problem I'm facing is that they're in our database in a 3 field plaintext format [whoever thought that was a good idea should be shot IMHO], so trying to handle rural routes, general deliveries, large volume receivers, multiple countries, province vs. state vs. county, postal codes vs. zip codes, spelling mistakes is no small or simple task.
Spelling mistakes alone was no small feat - especially when you get to countries that use French names - matching Saint, Sainte, St, Ste, Saints, Saintes, Sts, Stes, Grand, Grande, Grands, Grandes with or without period or hyphenation to the larger part of a name cause no end of performance issues - especially when St could mean saint or street and may or may not have been entered in the correct context (i.e. feminine vs. masculine). What if the address has largely been entered correctly but has an incorrect province or postal code?
One place to start your search is the Levenstein Distance Algorithm which I've found to be really useful for eliminating a large portion of spelling mistakes. After that, it's mostly a case of searching for keywords and comparing against a postal database.
I would be really interested in collaborating with anyone that is currently developing tools to do this, perhaps we can assist each other to a common solution. I'm already part of the way there and have overcome all the issues I've mentioned so far, having someone else working on the same problem would be really helpful to bounce ideas off.
Cheers -
[ben at afsinc dot ca]

If you would prefer tonot develop one and rather use an off-the-shelf product that uses many of the technologies mentioned here, see: http://www.melissadata.com/dqt/matchup-api.htm
Disclaimer: I had a role in its development and work for the company.

In the UK we would use:
House Name or Number (where name includes Flat number for apartment blocks)
Postcode
You should certainly be using the postcode, but in the US I believe your Zip codes cover very wide areas compared to postcodes in the UK. You would therefore need to use the street and city.
Your example wouldn't differentiate between 11 Test Street, 110 - 119 Test Street, etc.
If your company has access to an address lookup system, I would run all the data through that to get the data back in a consistent format, possibly with address keys that can be used for matching.

If I was to take a crack at this I'd convert each address string into a tree using a pre-defined order of operations.
Eg. 110 Test Street Apt 3. Anywhere California 90210 =>
Get the type of address. Eg Street addresses have different formats that rural route addresses and this is different by country.
Given that this is a street address, get the string that represents the type of street and convert that to an enum (eBoulevard, eRoad, etc..)
Given that this is a street address, pull out the street name (store in lower case)
Given that this is a street address, pull out the street number
Given that this is a street address, look for any apartment number (could be before the street number with a dash, could be after "Apt.", etc...)
eStreet //1.an enum of possible address types eg. eStreet, eRuralRoute,...
|
eStreet //2.an enum of street types eg. eStreet, eBlvd, eWay,...
/ | \
Name Number Apt
| | |
test 110 3
Eg. RR#3 Anywhere California 90210 =>
Get the type of address: rural route
Given that this is a rural route address, get the route number
eRuralRoute
|
3
You'll need to do something similar for country state and zip information.
Then compare the resulting trees.
This makes the comparison very simple, however, the code to generate the trees is very tricky. You'd want to test the crap out of it on thousands and thousands of addresses. Your problem is simpler if it is only US addresses you care about; British addresses as already mentioned are quite different, and Canadian address may have French in them (eg. Place D'Arms, Rue Laurent, etc...)

If it is cost-effective for your company to write its own address normalization tool then I'd suggest starting with the USPS address standard. Alternatively, there are any number of vendors offering server side tools and web services to normalize, correct and verify addresses.
My company uses AccuMail Gold for this purpose because it does a lot more than just standardize & correct the address. When we considered the cost of even one week's worth of salary to develop a tool in-house the choice to buy an off-the-shelf product was obvious.

If you dont chose to use an existing system, one idea is to do the following:
Extract numbers from the address line
replace common street words with blanks
create match string
ie: "555 Canal Street":
Extract number gives "555" + "Canal Street"
Replace street words gives "555" + "Canal"
Create match string gives "555Canal"
"Canal st 555" would give the same match string.
By street words i mean words and abbreviations for "street" in your language, for example "st", "st.", "blv", "ave", "avenue", etc etc all are removed from the string.
By extracting numbers and separating them from the string it does not matter if they are first or last.

use an identity for the primary key, this will always be unique and will make it easier to merge duplicates later.
force proper data entry with the user interface. Make them enter each component in its own text box. The house number is entered in own box, the street name in its own box, city in own box, state from select list, etc.. This will make looking for matches easier
have a two process "save"
after initial save, do a search to look up matches, present them with list of possible matches as well as the new one.
after they select the new one save it, if they pick an existing one use that ID
clean the data. Try to strip out "street", "st", "drive", etc and store it as a StreetType char(1) that uses a FK to a table containing the proper abbreviations, so you can build the street.
look into SOUNDEX and DIFFERENCE
I have worked at large companies that maintain mailinig lists, and they did not attempt to do it automatically, they used people to filter out the new from the dups because it is so hard to do. Plan for a merge feature so you can manually merge duplicates when they occur, and ripple the values through the PKs.
You might look into the google maps api and see if you can pass in you address and get a match back. I'm not familiar with it, this is just speculation.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.