This question already has answers here:
RegEx for matching UK Postcodes
(33 answers)
Closed 3 years ago.
I need to check the U.K postcode against a list.
The U.K postcode is of a standard format but the list only contains the outward section that I need to check against.
The list contains a series of outward postcode with also some data relating to this outward postcode, so for example
AL St Albans
B Birmingham
BT Belfast
TR Taunton
TR21 Taunton X
TR22 Taunton Y
My aim is that when I get a postcode, for example B20 7TP, I can search and find Birmingham.
Any ideas??
The question is different to the ones referred to as possible answers, but in my case I need to check a full postcode against just the outward postcode.
If you have the whole postcode and only want to use the outcode, remove the last three characters and use what remains. All postcodes end with the pattern digit-alpha-alpha, so removing those characters will give the outcode; any string that does not fit that pattern or that does not give a valid outcode after removing that substring is not a valid postcode. (Source)
If you're willing to take on an external (and Internet-based) dependency, you could look at using something like https://postcodes.io, in particular the outcodes section of that API. I have no affiliation with postcodes.io; I just found it after a Google.
Per the documentation, /outcodes will return
the outcode
the eastings
the northings
the andministrative counties under the code
the district/unitary authories under the code
the administrative/electoral areas under the code
the WGS84 logitude
the WGS84 latitude
the countries included in the code
the parish/communities in the code
For reference, a call to /outcodes/TA1 returns:
{
"status": 200,
"result": {
"outcode": "TA1",
"longitude": -3.10297767924529,
"latitude": 51.0133987332761,
"northings": 124359,
"eastings": 322721,
"admin_district": [
"Taunton Deane"
],
"parish": [
"Taunton Deane, unparished area",
"Bishop's Hull",
"West Monkton",
"Trull",
"Comeytrowe"
],
"admin_county": [
"Somerset"
],
"admin_ward": [
"Taunton Halcon",
"Bishop's Hull",
"Taunton Lyngford",
"Taunton Eastgate",
"West Monkton",
"Taunton Manor and Wilton",
"Taunton Fairwater",
"Taunton Killams and Mountfield",
"Trull",
"Comeytrowe",
"Taunton Blackbrook and Holway"
],
"country": [
"England"
]
}
}
If you have the whole postcode, the /postcodes endpoint will return considerably more detailed information which I will not include here, but it does include the outcode and the incode as separate fields.
I would, of course, recommend caching the results of any call to a remote API.
Build a regular expression from the list of known codes. Pay attention that the order of known codes in the regular expression matters. You need to use longer codes before shorter codes.
private void button1_Click(object sender, EventArgs e)
{
textBoxLog.Clear();
var regionList = BuildList();
var regex = BuildRegex(regionList.Keys);
TryMatch("B20 7TP", regionList, regex);
TryMatch("BT1 1AB", regionList, regex);
TryMatch("TR21 1AB", regionList, regex);
TryMatch("TR0 00", regionList, regex);
TryMatch("XX123", regionList, regex);
}
private static IReadOnlyDictionary<string, string> BuildList()
{
Dictionary<string, string> result = new Dictionary<string, string>();
result.Add("AL", "St Albans");
result.Add("B", "Birmingham");
result.Add("BT", "Belfast");
result.Add("TR", "Taunton");
result.Add("TR21", "Taunton X");
result.Add("TR22", "Taunton Y");
return result;
}
private static Regex BuildRegex(IEnumerable<string> codes)
{
// Sort the code by length descending so that for example TR21 is sorted before TR and is found by regex engine
// before the shorter match
codes = from code in codes
orderby code.Length descending
select code;
// Escape the codes to be used in the regex
codes = from code in codes
select Regex.Escape(code);
// create Regex Alternatives
string codesAlternatives = string.Join("|", codes.ToArray());
// A regex that starts with any of the codes and then has any data following
string lRegExSource = "^(" + codesAlternatives + ").*";
return new Regex(lRegExSource, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
/// <summary>
/// Try to match the postcode to a region
/// </summary>
private bool CheckPostCode(string postCode, out string identifiedRegion, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
// Check whether we have any match at all
Match match = regex.Match(postCode);
bool result = match.Success;
if (result)
{
// Take region code from first match group
// and use it in dictionary to get region name
string regionCode = match.Groups[1].Value;
identifiedRegion = regionList[regionCode];
}
else
{
identifiedRegion = "";
}
return result;
}
private void TryMatch(string code, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
string region;
if (CheckPostCode(code, out region, regionList, regex))
{
AppendLog(code + ": " + region);
}
else
{
AppendLog(code + ": NO MATCH");
}
}
private void AppendLog(string log)
{
textBoxLog.AppendText(log + Environment.NewLine);
}
Produces this output:
B20 7TP: Birmingham
BT1 1AB: Belfast
TR21 1AB: Taunton X
TR0 00: Taunton
XX123: NO MATCH
For your information, the regex built here is ^(TR21|TR22|AL|BT|TR|B).*
Related
My problem is that my csv file has data stored in form of a json file. I now need to extract that data in the most efficient way to create objects that store the data.
My csv-file looks like this:
1. 2022-09-19,"{
2. "timestamp": 41202503,
3. "machineId": 3567,
4. "status": 16,
5. "isActive": false,
6. "scanWidth": 5.0,
7. }"
8. 2022-09-19,"{
9. "timestamp": 41202505,
10. "machineId": 3568,
11. "status": 5,
12. "isActive": true,
13. "scanWidth": 1.4,
14. }"
15. 2022-09-19,"{
16. "timestamp": 41202507,
17. "machineId": 3569,
18. "status": 12,
19. "isActive": false,
20. "scanWidth": 6.2,
21. }"
In my project I would have class called "MachineData" with all the relevant properties.
My question is now, how can I extract the data stored in this csv file?
Thanks again for helping!
Create a type to represent this data:
class ResultItem
{
public string Date { get; set; }
public string Timestamp { get; set; }
public string MachineId { get; set; }
public string Status { get; set; }
public string IsActive { get; set; }
public string ScanWidth { get; set; }
}
Use Regex and Newtonsoft.Json to extract the data:
//Remove the line number
csvText = Regex.Replace(csvText, #"\d+\. ", "");
//Match items with separating the date and the json in different groups
var matches = Regex.Matches(csvText, #"(?<date>\d{4}-\d{2}-\d{2}),""(?<json>(.+\n){6}})", RegexOptions.CultureInvariant | RegexOptions.Multiline);
var results = new List<ResultItem>();
foreach (Match match in matches)
{
//Getting values from json group
var item = JsonConvert.DeserializeObject<ResultItem>(match.Groups["json"].Value);
//Getting value from date group
item.Date = match.Groups["date"].Value;
results.Add(item);
}
I wouldn't normally recommend parsing json with Regular Expressions, but this is a special case, so here is a solution entirely based on RegEx:
string pattern = #"(?<date>[0-9-]+),""{\s+""timestamp"":\s+(?<timestamp>[0-9]+),\s+""machineId"":\s+(?<machineId>[0-9]+),\s+""status"":\s+(?<status>[0-9]+),\s+""isActive"":\s+(?<isActive>(true|false)),\s+""scanWidth"":\s+(?<scanWidth>[0-9\.]+),\s+}""";
Regex rg = new Regex(pattern, RegexOptions.Multiline);
foreach (Match match in rg.Matches(File.ReadAlltext("nameoffile.csv")))
{
Console.WriteLine(match.Groups["date"].Value);
Console.WriteLine(match.Groups["timestamp"].Value);
Console.WriteLine(match.Groups["machineId"].Value);
Console.WriteLine(match.Groups["status"].Value);
Console.WriteLine(match.Groups["isActive"].Value);
Console.WriteLine(match.Groups["scanWidth"].Value);
}
Note that if the input differs the slightest this will fail (negative values, additional or missing white space, etc.).
EDIT
If the line numbers are part of the input, you need to add [0-9]+\.\s+ in the beginning of the Regex to swallow the line number, the dot, and the white space, giving:
string pattern = #"[0-9]+\.\s+(?<date>[0-9-]+),""{\s+""timestamp"":\s+(?<timestamp>[0-9]+),\s+""machineId"":\s+(?<machineId>[0-9]+),\s+""status"":\s+(?<status>[0-9]+),\s+""isActive"":\s+(?<isActive>(true|false)),\s+""scanWidth"":\s+(?<scanWidth>[0-9\.]+),\s+}""";
I have a process that parses emails. The software that we're using to retrieve and store the contents of the body doesn't seem to include line-breaks, so I end up with something like this -
Good afternoon, [line-break] this is my email. [line-break] Info: data [line-break] More info: data
My [line-break] brackets are where the line breaks should be. However, when we extract the body, we get just the text. It makes it tough to parse the text without having the line breaks.
Essentially, what I need to do is parse each [Info]: [Data]. I can find where the [Info] tags begin, but without having line-breaks, I'm struggling to know where the data associated to that info should end. The email is coming from Windows.
Is there any way to take plain text and encode it to some way that would include line breaks?
Example Email Contents
Good Morning, Order: 1234 The Total: $445 When: 7/10 Type: Dry
Good Morning, Order: 1235 The Total: $1743 Type: Frozen When: 7/22
Order: 1236 The Total: $950.14 Type: DRY When: 7/10
The Total: $514 Order: 1237 Type: Dry CSR: Tim W
Sorry, below is your order: Order: 1236 The Total: $500 When: 7/10 Type: Dry Creator: Josh A. Thank you
Now, I need to loop through the email and parse out the values for Order, Total, and Type. The other placeholder: values are irrelevant and random.
Try something like this.
You need to add all possible sections identifiers: it can be updated over time, to add more known identifiers, to reduce the chance of mistakes in parsing the strings.
As of now, if the value marked by a known identifier contains an unknown identifier when the string is parsed, that part is removed.
If an unknown identifier is encountered, it's ignored.
Regex.Matches will extract all matching parts, return their Value, the Index position and the length, so it's simple to use [Input].SubString(Index, NextPosition - Index) to return the value corresponding to the part requested.
The EmailParser class GetPartValue(string) returns the content of an identifier by its name (the name can include the colon char or not, e.g. "Order" or "Order:").
The Matches properties returns a Dictionary<string, string> of all matched identifiers and their content. The content is cleaned up - as possible - calling CleanUpValue() method.
Adjust this method to deal with some specific/future requirements.
► If you don't pass a Pattern string, a default one is used.
► If you change the Pattern, setting the CurrentPatter property (perhaps using one stored in the app settings or edited in a GUI or whatever else), the Dictionary of matched values is rebuilt.
Initialize with:
string input = "Good Morning, Order: 1234 The Total: $445 Unknown: some value Type: Dry When: 7/10";
var parser = new EmailParser(input);
string value = parser.GetPartValue("The Total");
var values = parser.Matches;
public class EmailParser
{
static string m_Pattern = "Order:|The Total:|Type:|Creator:|When:|CSR:";
public EmailParser(string email) : this(email, null) { }
public EmailParser(string email, string pattern)
{
if (!string.IsNullOrEmpty(pattern)) {
m_Pattern = pattern;
}
Email = email;
this.Matches = GetMatches();
}
public string Email { get; }
public Dictionary<string, string> Matches { get; private set; }
public string CurrentPatter {
get => m_Pattern;
set {
if (value != m_Pattern) {
m_Pattern = value;
this.Matches = GetMatches();
}
}
}
public string GetPartValue(string part)
{
if (part[part.Length - 1] != ':') part += ':';
if (!Matches.Any(m => m.Key.Equals(part))) {
throw new ArgumentException("Part non included");
}
return Matches.FirstOrDefault(m => m.Key.Equals(part)).Value;
}
private Dictionary<string, string> GetMatches()
{
var dict = new Dictionary<string, string>();
var matches = Regex.Matches(Email, m_Pattern, RegexOptions.Singleline);
foreach (Match m in matches) {
int startPosition = m.Index + m.Length;
var next = m.NextMatch();
string parsed = next.Success
? Email.Substring(startPosition, next.Index - startPosition).Trim()
: Email.Substring(startPosition).Trim();
dict.Add(m.Value, CleanUpValue(parsed));
}
return dict;
}
private string CleanUpValue(string value)
{
int pos = value.IndexOf(':');
if (pos < 0) return value;
return value.Substring(0, value.LastIndexOf((char)32, pos));
}
}
I'm using Regex to match characters from a file, but I want to match 2 different strings from that file but they appear more than once, that's why I am using a loop. I can match with a single string but not with 2 strings.
Regex celcius = new Regex(#"""temp"":\d*\.?\d{1,3}");
foreach (Match match in celcius.Matches(htmlcode))
{
Regex date = new Regex(#"\d{4}-\d{2}-\d{2}");
foreach (Match match1 in date.Matches(htmlcode))
{
string date1 = Convert.ToString(match1.Value);
string temperature = Convert.ToString(match.Value);
Console.Write(temperature + "\t" + date1);
}
}
htmlcode:
{"temp":287.05,"temp_min":286.932,"temp_max":287.05,"pressure":1019.04,"sea_level":1019.04,"grnd_level":1001.11,"humidity":89,"temp_kf":0.12},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":100},"wind":{"speed":0.71,"deg":205.913},"sys":{"pod":"n"},"dt_txt":"2019-09-22
21:00:00"},{"dt":1569196800,"main":{"temp":286.22,"temp_min":286.14,"temp_max":286.22,"pressure":1019.27,"sea_level":1019.27,"grnd_level":1001.49,"humidity":90,"temp_kf":0.08},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":99},"wind":{"speed":0.19,"deg":31.065},"sys":{"pod":"n"},"dt_txt":"2019-09-23
00:00:00"},{"dt":1569207600,"main":{"temp":286.04,"temp_min":286,"temp_max":286.04,"pressure":1019.38,"sea_level":1019.38,"grnd_level":1001.03,"humidity":89,"temp_kf":0.04},"weather":
You can use a single Regex pattern with two capturing groups for temperature and date. The pattern can look something like this:
("temp":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})
Regex demo.
C# example:
string htmlcode = // ...
var matches = Regex.Matches(htmlcode, #"(""temp"":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})");
foreach (Match m in matches)
{
Console.WriteLine(m.Groups[1].Value + "\t" + m.Groups[2].Value);
}
Output:
"temp":287.05 2019-09-22
"temp":286.22 2019-09-23
Try it online.
I don't think you have HTML. I think you have a collection of something called JSON (JavaScript Object Notification) which is a way to pass data efficiently.
So, this is one of your "HTML" objects.
{
"temp":287.05,
"temp_min":286.932,
"temp_max":287.05,
"pressure":1019.04,
"sea_level":1019.04,
"grnd_level":1001.11,
"humidity":89,
"temp_kf":0.12},
"weather":[{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04n"
}],
"clouds":{
"all":100
},
"wind":{
"speed":0.71,"deg":205.913
},
"sys":{
"pod":"n"
},
"dt_txt":"2019-09-22 21:00:00"
}
So, I would recommend converting the line using the C# web helpers and parsing the objects directly.
//include this library
using System.Web.Helpers;
//parse your htmlcode using this loop
foreach(var line in htmlcode)
{
dynamic data = JSON.decode(line);
string temperature = (string)data["temp"];
string date = Convert.ToDateTime(data["dt_txt"]).ToString("yyyy-MM-dd");
Console.WriteLine($"temperature: {temperature} date: {date}"");
}
I am using a string list in c#, which contains a list of subjects.
E.g art, science, music.
I then have the user input "I would like to study science and art."
I would like to store the results into a variable, but I get lots of duplicates like "science, sciencemusic" (that's not a typo).
I think it's from the looping of the for each statement. Could there be an easier way to do this or is there something wrong in my code? I can't figure it out.
Here's my code:
string input = "I would like to study science and art.";
string result = "";
foreach (string sub in SubjectsClass.SubjectsList)
{
Regex rx = new Regex(sub, RegexOptions.IgnoreCase);
MatchCollection matches = rx.Matches(input);
foreach (Match match in matches)
{
result += match.Value;
}
}
The subjects class function "SubjectsList" is read from a CSV file with only words in it of random subjects:
CSV File:
Computing
English
Maths
Art
Science
Engineering
private list<string> subjects = new list<string>();
//Read data from csv file to list...
public list<string>SubjectsList
{
get { return subjects; }
{
Currently the output I get is this:
"input": "art science",
"Subject": "artscienceartscienceartscience"
If I change:
result += match.Value;
to
result += match.Value + " ";
I get lots of spaces.
edit: I should mention that this code runs on a WPF c# button press and then shows the result.
Using your code, and with the following test data:
List<string> subjects = new List<string>{"Science", "Art", "Maths"};
string input = "I would like to study science and art.";
I don't get duplicates.
To avoid blank matches, perform a check on the value being empty
foreach (Match match in matches)
{
if (!string.IsNullOrEmpty(match.Value))
{
result += match.Value + " ";
}
}
I'm having a set of row data as follows
List<String> l_lstRowData = new List<string> { "Data 1 32:01805043*0FFFFFFF",
"Data 3, 20.0e-3",
"Data 2, 1.0e-3 172:?:CRC" ,
"Data 6"
};
and two List namely "KeyList" and "ValueList" like
List<string> KeyList = new List<string>();
List<string> ValueList = new List<string>();
I need to fill the two List<String> from the data from l_lstRowData using Pattern Matching
And here is my Pattern for this
String l_strPattern = #"(?<KEY>(Data|data|DATA)\s[0-9]*[,]?[ ]?[0-9e.-]*)[ \t\r\n]*(?<Value>[0-9A-Za-z:?*!. \t\r\n\-]*)";
Regex CompiledPattern=new Regex(l_strPattern,RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
So finally the two Lists will contain
KeyList
{ "Data 1" }
{ "Data 3, 20.0e-3" }
{ "Data 2, 1.0e-3" }
{ "Data 6" }
ValueList
{ "32:01805043*0FFFFFFF" }
{ "" }
{ "172:?:CRC" }
{ "" }
Scenerio:
The Group KEY in the Pattern Should match "The data followed by an integer value , and the if there exist a comma(,) then the next string i.e a double value
The Group Value in the Pattern should match string after the whitespace.In the first string it should match 32:01805043*0FFFFFFF but in the 3rd 172:?:CRC.
Here is my sample code
for (int i = 0; i < l_lstRowData.Count; i++)
{
MatchCollection M = CompiledPattern.Matches(l_lstRowData[i], 0);
KeyList.Add(M[0].Groups["KEY"].Value);
ValueList.Add(M[0].Groups["Value"].Value);
}
But my Pattern is not working in this situation.
EDIT
My code result like
KeyList
{ "Data 1 32" } // 32 should be in the next list
{ "Data 3, 20.0e-3" }
{ "Data 2, 1.0e-3" }
{ "Data 6" }
ValueList
{ ":01805043*0FFFFFFF" }
{ "" }
{ "172:?:CRC" }
{ "" }
Please help me to rewrite my Pattern.
Your code works for me, so please define what's not working.
Also:
start your regexp with ^ and end it with $
use regex.Match() instead of Matches() because you know it'll only match once
i don't see why you use IgnorePatternWhitespace
use a simple comma instead of [,], a simple space instead of [ ]
use \s instead of [ \t\r\n]
if you specify IgnoreCase, then no need for Data|data|DATA or [A-Za-z]
And if you clean this up, maybe you can solve it alone :)
I think a simpler regex would work: (?<key>data \d(?:, [\d.e-]+)?)(?<value>.*) will match your keys and values, providing you use the RegexOptions.IgnoreCase flag too.
You can see the results at this Rubular example link.