I'm using Regex list to parse FTP server listing. I'm not good with Regex at all, this is list of regex I collected online to parse various server FTP outputs:
private static readonly string[] DirectoryParseFormats =
{
"(?<dir>[\\-d])(?<permission>([\\-r][\\-w][\\-xs]){3})\\s+\\d+\\s+\\w+\\s+\\w+\\s+(?<size>\\d+)\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{4})\\s+(?<name>.+)",
"(?<dir>[\\-d])(?<permission>([\\-r][\\-w][\\-xs]){3})\\s+\\d+\\s+\\d+\\s+(?<size>\\d+)\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{4})\\s+(?<name>.+)",
"(?<dir>[\\-d])(?<permission>([\\-r][\\-w][\\-xs]){3})\\s+\\d+\\s+\\d+\\s+(?<size>\\d+)\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{1,2}:\\d{2})\\s+(?<name>.+)",
"(?<dir>[\\-d])(?<permission>([\\-r][\\-w][\\-xs]){3})\\s+\\d+\\s+\\w+\\s+\\w+\\s+(?<size>\\d+)\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{1,2}:\\d{2})\\s+(?<name>.+)",
"(?<dir>[\\-d])(?<permission>([\\-r][\\-w][\\-xs]){3})(\\s+)(?<size>(\\d+))(\\s+)(?<ctbit>(\\w+\\s\\w+))(\\s+)(?<size2>(\\d+))\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{2}:\\d{2})\\s+(?<name>.+)",
"(?<timestamp>\\d{2}\\-\\d{2}\\-\\d{2}\\s+\\d{2}:\\d{2}[Aa|Pp][mM])\\s+(?<dir>\\<\\w+\\>){0,1}(?<size>\\d+){0,1}\\s+(?<name>.+)"
};
Now I stumbled upon following output from odd FTP server. What's weird is that server outputs file name together with folder name for some reason.
Anyway, I'd like to have similar RegEx for this string, ideally introduce folder name to separate it out, String returned by server is what's inside pipes |
|-rw-rw-rw- 1 generic 235 Mar 22 11:21 fromDoder/DOD997ABCD.20170322112114159.1961812284.txt|
EDIT:
Here is C# code I use to iterate through regex expressions to pick one that matches FTP server output. Then I use it to parse out file name and type
// Use our regex library to parse
match = DirectoryParseFormats.Select(dpf => new Regex(dpf).Match(raw)).FirstOrDefault(m => m.Success);
if (match == null) throw new Exception($"Can't parse FTP directory list item. raw item: |{raw}|, whole response: |{response}|");
// If not directory - this is file
var dir = match.Groups["dir"].Value;
if (dir == string.Empty || dir == "-") list.Add(match.Groups["name"].Value);
EDIT 2:
total 0
drw-rw-rw- 1 user group 0 Apr 23 2016 .
drw-rw-rw- 1 user group 0 Apr 23 2016 ..
EDIT 3:
var hintRegex = #"^
(?<dir>[-d])
(?<permission>(?:[-r][-w][-xs]){3})
\s+\d+
\s+\w+
(?:\s+\w+)?
\s+(?<size>\d+)
\s+(?<timestamp>\w+\s+\d+(?:\s+\d+(?::\d+)?))
\s+(?!(?:\.|\.\.)\s*$)(?<name>.+?)\s*
$";
Match match = new Regex(hintRegex).Match("-rw-r--r-- 1 ftp ftp 1079 Apr 06 2017 LEANCOR_040617084839.txt");
if (!match.Success) Debug.WriteLine("Doesn't match");
Since your pattern looks like you're trying to match the output of ls -l, as well as you mentioning it's a list command. I'm assuming it is so.
The main problem I could gather from your code is that you're missing the multiline flag (RegexOptions.Multiline).
Your regex overall seems correct, I only did a few changes. Here's it layed out with a bit of spacing (which still works if you use the extended flag).
^
(?<dir>[-d])
(?<permission>(?:[-r][-w][-xs]){3})
\s+\d+
\s+\w+
(?:\s+\w+)?
\s+(?<size>\d+)
\s+(?<timestamp>\w+\s+\d+(?:\s+\d+(?::\d+)?))
\s+(?!(?:\.|\.\.)\s*$)(?<name>.+?)\s*
$
Here's a live preview.
You can test it by doing:
string pattern = #"^(?<dir>[-d])(?<permission>(?:[-r][-w][-xs]){3})\s+\d+\s+\w+(?:\s+\w+)?\s+(?<size>\d+)\s+(?<timestamp>\w+\s+\d+(?:\s+\d+(?::\d+)?))\s+(?!(?:\.|\.\.)\s*$)(?<name>.+?)\s*$";
Regex re = new Regex(pattern, RegexOptions.Multiline);
string source = #"
-rwxr-xr-x 1 root 46789 Feb 7 23:15 certbot-auto
drwxr-xr-x 2 root 4096 Mar 22 16:29 test dir
drwxr-xr-x 4 root 4096 Feb 10 15:50 www
-rw-rw-rw- 1 generic 235 Mar 22 11:21 fromDoder/DOD997ABCD.20170322112114159.1961812284.txt
-rw-rw-rw- 1 cmuser cmuser 904 Mar 23 15:04 20170323110427785_3741647.edi
drw-rw-rw- 1 user group 0 Apr 23 2016 .
drw-rw-rw- 1 user group 0 Apr 23 2016 ..
drw-rw-rw- 1 user group 0 Apr 23 2016 .cache
drw-rw-rw- 1 user group 0 Apr 23 2016 .bashrc
";
MatchCollection matches = re.Matches(source);
Console.WriteLine(matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups["dir"]);
Console.WriteLine(match.Groups["permission"]);
Console.WriteLine(match.Groups["size"]);
Console.WriteLine(match.Groups["timestamp"]);
Console.WriteLine(match.Groups["name"]);
Console.WriteLine();
}
Note that the content of source is just an edited version of the output of ls -l on my server (with the addition of your example). So if my assumptions are correct, it should look familiar to you.
Edit: Based on your comment, you simply need to remove one of the \s+\w+ (I've updated all the above to reflect that).
The regex for the given string input goes as under:
(?<permission>([\\-rwxs]+){3})\\s+\\d+\\s+\\w+\\s+(?<size>\\d+)\\s+(?<timestamp>\\w+\\s+\\d+\\s+\\d{1,2}:\\d{1,2})\\s+(?<folder>\\w+\\/)?(?<name>.+)
The online regex test including regex pattern and the given input string is shown in the image below.
Related
I have made a Docker installation for a C# application, and I'm having some issues with the environment (more especially the DOTNET_VERSION, but also the others as can be seen in the "Inspect" tab of Docker-compose). In order to understand the whole situation, I would like to know where the Docker installation gets the mentioned information from. (I didn't find anything in the C# Visual Studio *.csproj file, nor in the Docker(-compose) related *.yml files.)
This is what the current "Inspect" page looks like:
Environment
--------
--------------
DOTNET_USE_POLLING_FILE_WATCHER
1
ASPNETCORE_LOGGING__CONSOLE__DISABLECOLORS
true
ASPNETCORE_ENVIRONMENT
Development
ASPNETCORE_URLS
https://+:443;http://+:80
NUGET_PACKAGES
/root/.nuget/fallbackpackages5
NUGET_FALLBACK_PACKAGES
/root/.nuget/fallbackpackages;...;/root/.nuget/fallbackpackages5
PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DOTNET_RUNNING_IN_CONTAINER
true
DOTNET_VERSION
5.0.17
ASPNET_VERSION
5.0.17
Mounts
--------
--------------
/APP
C:\Firm\Projecten...\Server\Connections\ConnectionsSvc
/REMOTE_DEBUGGER
C:\Users\Usr\vsdbg\vs2017u5
/SRC
/run/desktop/mnt/host/c/Firm/Projecten/.../AnotherSvc
/ROOT/.NUGET/FALLBACKPACKAGES4
C:\Program Files\dotnet\sdk\NuGetFallbackFolder
/ROOT/.NUGET/FALLBACKPACKAGES5
C:\Users\Usr.nuget\packages\
/ROOT/.ASPNET/HTTPS
C:\Users\Usr\AppData\Roaming\ASP.NET\Https
/ROOT/.MICROSOFT/USERSECRETS
C:\Users\Usr\AppData\Roaming\Microsoft\UserSecrets
/ROOT/.NUGET/FALLBACKPACKAGES
C:\Program Files (x86)\DevExpress 21.2\Components\Offline Packages
/ROOT/.NUGET/FALLBACKPACKAGES2
C:\Program Files (x86)\Microsoft Visual Studio\Shared\NuGetPackages
/ROOT/.NUGET/FALLBACKPACKAGES3
C:\Program Files (x86)\Progress\ToolboxNuGetPackages
Ports
--------
--------------
443/tcp
0.0.0.0:49154
80/tcp
0.0.0.0:5128
Edit after comments from Benoit.Be:
Pardon my ignorance, but I have no idea where to look for the FROM ... entry you mention.
I'll show you what my "DockerCompose" directory looks like (it's on a Windows computer, but I use a WSL for launching commandline commands):
Prompt> ls -ltra
total 12
-rwxrwxrwx 1 usr usr 5602 Oct 4 09:44 docker-compose.yml
drwxrwxrwx 1 usr usr 4096 Oct 4 09:47 mssqlserver
drwxrwxrwx 1 usr usr 4096 Oct 4 09:47 elasticsearch
drwxrwxrwx 1 usr usr 4096 Oct 4 09:47 rabbitmq
drwxrwxrwx 1 usr usr 4096 Oct 10 11:24 .
-rwxrwxrwx 1 usr usr 1713 Oct 10 11:24 docker-compose.base.yml
drwxrwxrwx 1 usr usr 4096 Oct 10 11:25 Envoy
drwxrwxrwx 1 usr usr 4096 Nov 22 09:33 ..
Prompt> find ./ -maxdepth 3 -type d
./
./elasticsearch
./elasticsearch/data
./elasticsearch/data/nodes
./Envoy
./mssqlserver
./mssqlserver/data
./rabbitmq
./rabbitmq/data
./rabbitmq/data/mnesia
./rabbitmq/log
Where can I look for the mentioned information?
Oh, I've launched a "find in file(s)", looking for the "FROM" keyword, and I found something in the "Envoy" container. Does this mean that that "Envoy" container is what you call the base image?
Here's the entry:
./Envoy/Dockerfile:FROM envoyproxy/envoy-dev:9105f45c7fb872d1db2bf8q9bc608368effe77cd
Hello fellow Programmers.
I now spend a whole Day reading Threads to solve this Problem.
I am Parsing HTML from an automatic generated schedule, the same schedule programm was discussed 6 years ago on this thread: Parsing complex HTML tables
But this java / javascript solutions wont work for me. Also the mentioned Programs arent working anymore, I think they released a new Version of the Software. This is the Example I am trying to Parse: https://www.ostfalia.de/cms/de/b/studium/stundenplaene/download/ss19_b_stdgrp_ai_6.html
I need the Parsed Data in the right sequence because I want to generate an iCalendar file with it, or pass/send the data into an self written schedule App
I am using the HTML Agility Pack and Im already sucessful with parsing what I need but I cant get it in the right order its complete split because the HAP displays by row like any other parser. Im so desperate I was close to just count the emtpy trs to estimate when a new row begins but this doesnt work because the program has sometimes more sometimes less empty lines. Does somebody of you has an idea?
This is my Code to get the infos I need:
WebClient client = new WebClient();
string html = client.DownloadString("https://www.ostfalia.de/cms/de/b/studium/stundenplaene/download/ss19_b_stdgrp_ai_6.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var erg = doc.DocumentNode.SelectNodes("//td[#class='v']");
for (int i = 0; i < erg.Count; i++)
{
txt_check.Text = (erg[i].InnerText);
list_check.Items.Add(erg[i].InnerText);
}
"class v" is the event and "class t" is the time in this example its just class v
I get:
The output of class='v' looks like:
10:30 - 12:00 UhrAI 6.1 Komponentenbasierte SoftwareentwicklungB. RogallaB 109
13:00 - 14:30 UhrAI WPF-12 CCNA 2 Cisco Routing & SwitchingChr. HollmannA 201
13:00 - 14:30 UhrAI WPF Inst. u. Betrieb einer Datenb. a. B. OracleD. HeringA
14:45 - 16:15 UhrAI WPF Mathematik III für InformatikerT. WaldeerB 27 114
14:45 - 16:15 UhrAI WPF-18 AutomatisierungstechnikF. DziembowskiA 107
The output of class='t' looks like:
"Di, 05.03.2019"
"Mi, 06.03.2019"
"Do, 07.03.2019"
"Fr, 08.03.2019"
"Sa, 09.03.2019"
I hope someone has an idea how I can Sort and Match the informations in an Dictionary or List to get it in an ICS.
The Output should be like:
"MI, 06.03.2019 , 8:45 - 10:15 ,AI 6.1 Komponentenbasierte Softwareentwicklung B. Rogalla B109 , 10:30 - 12:00 AI 6.1 Komponentenbasierte Softwareentwicklung B. Rogalla B109"
...
So I can bring it in the ICS Format or DATE/TIME for an Calendar App or something.
Pastbin for the whole HTML:
https://pastebin.com/hHbJTujN
Some Pictures of the Output:
https://drive.google.com/open?id=16Y_hISdVEvzlrS6LCmBMcwAarGhz__t0
I have a problem configuring sphinx+mysql on my machine (Windows 7).
I use sphinx 2.0.6 and MySQL connector 6.5.5 to get to sphinx from C# code. Everything works fine when I try to search a words in English ("madrid" for ex.). But when I send a query from C# code which contains a cyrillic word (that had to be indexed) I receive no results. Here is what I see in the "query.log" file:
[Tue Mar 26 16:35:12.642 2013] 0.000 sec [ext2/0/ext 0 (0,10)] [airportIndex] ????
Latin words looks normal:
[Tue Mar 26 16:35:06.195 2013] 0.000 sec [ext2/0/ext 0 (0,10)] [airportIndex] *mosc*
The charset_table seems to be correct in config:
charset_type = utf-8
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F, U+0401->U+0435, U+0451->U+0435
I just don't know what to do. I've googled for solution the whole day I tried many different solutions, but none of them helped me. Maybe anyone could help me here? Please...
Found it. It was a connector bug (or feature, I'm not sure). It was trying to get the server datetime offset, and failed because sphinx does not have this function. I've just commented this code line (inside MySql.Data.dll) and it started working correctly.
I have the following output from a utility I use for data processing.
Processed output from W765 build 66721
File target: C:\Documents and Settings\Jon\Desktop\test\1024\cards.dat
Cards loaded: 876 1456 1457 1459 2072
Errors encountered (0)
Warnings encountered (0)
Pass
I want a .Net regex to be able to retrieve as groups just 876 1456 1457 1459 2072 and nothing else.
I have got to this that works
([0-9]\d+)+
but unfortunately it yields
Found 8 matches:
765
66721
1024
876
1456
1457
1459
2072
I thought this would work instead
.*(?:Cards loaded\: )([0-9]\d+)+
but it doesn't.
Can someone please point me in the right direction.
Thank you
Jonathan Bolton
Try with this:
Cards loaded:(?'digits'(\d|\s)+)
this will return in the named group "digits" the numeric portion you need
Maybe you could try Cards loaded\: [\d\s]+ to return Cards loaded: 876 1456 1457 1459 2072, then on that string, do \d+ to get each of the relevant results.
Use this:
(?m)(?<=^Cards loaded: (?:\d+\s)*)\d+
Output:
Does it have to just be a regex, i.e. just remove the guff you don't want from the begining before parsing
string toSearch = #"Processed output from W765 build 66721 File target: C:\Documents and Settings\Jon\Desktop\test\1024\cards.dat Cards loaded: 876 1456 1457 1459 2072 Errors encountered (0) Warnings encountered (0) Pass";
string shortened = toSearch.Substring(toSearch.IndexOf("Cards loaded:"));
var matches = Regex.Matches(shortened,#"([0-9]\d+)+");
I'm using Interop.Domino.dll to retrieve E-mails from a Lotus "Database" (Term used loosely). I'm having some difficulty in retrieving certain fields and wonder how to do this properly. I've been using NotesDocument.GetFirstItem to retrieve Subject, From and Body.
My issues in this regard are thus:
How do I retrieve Reply-To address? Is there a list of "Items" to get somewhere? I can't find it.
How do I retrieve friendly names for From and Reply-To addresses?
When I retrieve Body this way, it's formatted wierdly with square bracket sets ([]) interspersed randomly across the message body, and parts of the text aren't where I expect them.
Related code:
string
ActualSubject = nDoc.GetFirstItem("Subject").Text,
ActualFrom = nDoc.GetFirstItem("From").Text,
ActualBody = nDoc.GetFirstItem("Body").Text;
Hah, got it!
Object[] ni = (Object[])nDoc.Items;
string names_values = "";
for (int x = 0; x < ni.Length; x++)
{
NotesItem item = (NotesItem)ni[x];
if (!string.IsNullOrEmpty(item.Name)) names_values += x.ToString() + ": " + item.Name + "\t\t" + item.Text + "\r\n";
}
This returned a list of indices, names, and values:
0: Received from example.com ([192.168.0.1]) by host.example.com (Lotus Domino Release 6.5.4 HF182) with ESMTP id 2008111917343129-205078 ; Wed, 19 Nov 2008 17:34:31 -0500
1: Received from example.com ([192.168.0.2]) by host2.example.com (Lotus Domino Release 6.5.4 HF182) with ESMTP id 2008111917343129-205078 ; Wed, 19 Nov 2008 17:34:31 -0500
2: X_PGRTRKID 130057945714t
3: X_PGRSRC IE
4: ReplyTo "example" <name#email.example.com>
5: Principal "example" <customerservice#email.example.com>
6: From "IE130057945714t"<service#test.email.example.com>
7: SendTo me#example.com
8: Subject (Message subject redacted)
9: PostedDate 11/19/2008 03:34:15 PM
10: MIME_Version 1.0
11: $Mailer SMTP DirectMail
12: $MIMETrack Itemize by SMTP Server on xxxPT02-CORP/example(Release 6.5.4 HF182|May 31, 2005) at 11/19/2008 05:34:31 PM;Serialize by Router on xxxPT02-CORP/example(Release 6.5.4 HF182|May 31, 2005) at 11/19/2008 05:34:32 PM;Serialize complete at 11/19/2008 05:34:32 PM;MIME-CD by Router on xxxPT02-CORP/example(Release 6.5.4 HF182|May 31, 2005) at 11/19/2008 05:34:32 PM;MIME-CD complete at 11/19/2008 05:34:32 PM;Itemize by Router on camp-db-05/example(Release 7.0.2 HF76|November 03, 2006) at 11/19/2008 05:34:32 PM;MIME-CD by Notes Client on MyName/Guest/example(Release 6.5.6|March 06, 2007) at 11/20/2008 12:46:25 PM;MIME-CD complete at 11/20/2008 12:46:25 PM
13: Form Memo
14: $UpdatedBy ;CN=xxxPT02-CORP/O=example
15: $ExportHeadersConverted 1
16: $MessageID <redacted#LocalDomain>
17: RouteServers CN=xxxPT02-CORP/O=example;CN=camp-db-05/O=example
18: RouteTimes 11/19/2008 03:34:31 PM-11/19/2008 03:34:32 PM;11/19/2008 03:34:32 PM-11/19/2008 03:34:32 PM
19: $Orig 958F2E4E4B666AB585257506007C02A7
20: Categories
21: $Revisions
22: DeliveredDate 11/19/2008 03:34:32 PM
23: Body []exampleexample
Now, who can tell me why the Body keeps getting messed up?
The Body item is a NotesRichTextItem, not a regular NotesItem. They are a different type of object in the Lotus Notes world (and often the source of much developer frustration!)
I don't have much experience with using COM to connect to Domino, and I know there are differences in what you have access to, but the Domino Designer Help should give you lots of information the classes, such as NotesRichTextItem.
Perhaps the method "GetFormattedText" would work better for you than accessing the item's Text property.
Here's an example of the method (taken from Domino Designer Help)
Dim doc As NotesDocument
Dim rtitem As Variant
Dim plainText As String
Dim fileNum As Integer
'...set value of doc...
Set rtitem = doc.GetFirstItem( "Body" )
If ( rtitem.Type = RICHTEXT ) Then
plainText = rtitem.GetFormattedText( False, 0 )
End If
' get a file number for the file
fileNum = Freefile
' open the file for writing
Open "c:\plane.txt" For Output As fileNum
' write the formatted text to the file
Print #fileNum, plainText
' close the file
Close #fileNum
It may not work depending on how your environment is set up, but the easiest way to deal with mail in domino is to leave them as MIME and get at the values via the NotesMIMEEntity and NotesMIMEHeader. This will only work if the mail came in from the web rather than native Notes and the environment has been set up to store mail in MIME format.
Otherwise you need to access the body as a NotesRichTextItem. From that item you need to get a NotesRichTextNavigator that will allow you move around the rich text structure if you need to.
If you think the struture should be relatively simple try calling NotesRichTextItem.GetFormattedText(). If that still isn't working then you're going to need to work out what is happeing by playing with an example doument and seeing what the structure looks like to the NotesRichTextNavigator.