Do not match YouTube URLs with beginning double quotes - C# Regex

Do not match YouTube URLs with beginning double quotes - C# Regex - c#

I have the following C# regex
#"(?:https?:\/\/)?(?:www\.)?(?:(?:(?:youtube\.com\/watch\?[^?]*v=|youtu\.be\/)))([\w-]+)";
How can I correct this so the regex won't match URLs with double quote at the beginning of the URL. so if the URL is in an href attribute in an hyperlink, it will be ignored and not captured.
I've used this expression in my other Twitter Regex pattern, but I can't make it work in this one.
(?<!"")
It worked on the Twitter pattern:
(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)
So the YouTube Regex should grab only URLs that are not with double quotes in the beginning of the URL.

To answer the question: (?<!") will fail a match if there is no " immediately before the current location. If there must be no " followed with 0+ other chars before the current location, you may leverage .NET infinite-width lookbehind.
In this case, you might want to turn your loobehind into
(?<!"[^"<>]*)
See the regex demo. Note that [^"<>]* matches 0+ chars other than ", < and >, so, the " will be checked only when inside an element node if the HTML is perfectly serialized. If it contains plain < or > inside attribute values, this approach won't work.
That is why you should think about using an appropriate HTML parser for this task, too, since you are using it already in the project. If you let know what you are trying to achieve, I will update the answer.

Related

Filter URLs vs other text with dots

I am actually trying to find all the URLs in string. There are many similar types of questions here in this website but nothing matches exactly what I want. My text may contain URLs as well as other text with dots. An example is shown below:
This is a text. It may contain links such as https://stackoverflow.com/, www.test.com but it may also contain other things such as exampleimage.this.png or picture.man.jpeg which is not a URL. On the other hand it many contain URLs without protocol such as example.com
So in the text above, I would only like to get the urls, so mainly -
https://stackoverflow.com/
www.test.com
example.com
But I should not get exampleimage.this.png or picture.man.jpeg as a url.
I have tried
"(?:(?:https?|ftp)://)?[\w/-?=%.]+.[\w/-&?=%.]+"
which gives me all the urls except example.com.
And I have also tried
"(ftp://|www.|https?://){1}[a-zA-Z0-9u00a1-\uffff0-]{2,}.[a-zA-Z0-9u00a1-\uffff0-]{2,}(\S*)"
"(?:(?:https?|ftp)://|\b(?:[a-z\d]+.))(?:(?:[^\s()<>]+|((?:[^\s()<>]+|(?:([^\s()<>]+)))?))+(?:((?:[^\s()<>]+|(?:(?:[^\s()<>]+)))?)|[^\s`!()[]{};:'"".,<>?«»“”‘’])"
which gives me all the urls with the exampleimage.this.png and picture.man.jpeg which is not what I want.
Could anybody please help me out? Anything other than Regex would also be fine.
I am using C# with Regex for this.

You need to match up to a valid TLD:
var pattern = #"(?i)\b(?:(?:http|ftp)s?://|www\.)?[\w/?=%.-]*?\.(?:A(?:A(?:A|RP)|B(?:ARTH|B(?:OTT|VIE)?|C|LE|OGADO|UDHABI)|C(?:ADEMY|C(?:ENTURE|OUNTANTS?)|O|TOR)?|D(?:AC|S|ULT)?|E(?:G|RO|TNA)?|F(?:L|RICA)?|G(?:AKHAN|ENCY)?|I(?:G|R(?:BUS|FORCE|TEL))?|KDN|L(?:FAROMEO|I(?:BABA|PAY)|L(?:FINANZ|STATE|Y)|S(?:ACE|TOM))?|M(?:AZON|E(?:RICAN(?:EXPRESS|FAMILY)|X)|FAM|ICA|STERDAM)?|N(?:ALYTICS|DROID|QUAN|Z)|OL?|P(?:ARTMENTS|P(?:LE)?)|Q(?:UARELLE)?|R(?:A(?:B|MCO)|CHI|MY|PA|TE?)?|S(?:DA|IA|SOCIATES)?|T(?:HLETA|TORNEY)?|U(?:CTION|DI(?:BLE|O)?|SPOST|T(?:HOR|OS?))?|VIANCA|WS?|XA?|Z(?:URE)?)|B(?:A(?:BY|IDU|N(?:A(?:MEX|NAREPUBLIC)|[DK])|R(?:C(?:ELONA|LAY(?:CARD|S))|EFOOT|GAINS)?|S(?:E|KET)BALL|UHAUS|YERN)?|B(?:[CT]|VA)?|C[GN]|D|E(?:A(?:TS|UTY)|ER|NTLEY|RLIN|ST(?:BUY)?|T)?|[FG]|H(?:ARTI)?|I(?:BLE|D|KE|NGO?|[OZ])?|J|L(?:ACK(?:FRIDAY)?|O(?:CKBUSTER|G|OMBERG)|UE)|M[SW]?|N(?:PPARIBAS)?|O(?:ATS|EHRINGER|FA|M|ND|O(?:K(?:ING)?)?|S(?:CH|T(?:IK|ON))|T|UTIQUE|X)?|R(?:ADESCO|IDGESTONE|O(?:ADWAY|KER|THER)|USSELS)?|[ST]|U(?:DAPEST|GATTI|ILD(?:ERS)?|SINESS|Y|ZZ)|[VWY]|ZH?)|C(?:A(?:B|FE|L(?:L|VINKLEIN)?|M(?:ERA|P)?|N(?:CERRESEARCH|ON)|P(?:ETOWN|ITAL(?:ONE)?)|R(?:AVAN|DS|E(?:ERS?)?|S)?|S(?:[AEH]|INO)|T(?:ERING|HOLIC)?)?|B(?:[AN]|RE|S)|[CD]|E(?:NTER|O|RN)|F[AD]?|G|H(?:A(?:N(?:E|NE)L|RITY|SE|T)|EAP|INTAI|R(?:ISTMAS|OME)|URCH)?|I(?:PRIANI|RCLE|SCO|T(?:ADEL|IC?|Y(?:EATS)?))?|K|L(?:AIMS|EANING|I(?:CK|NI(?:C|QUE))|O(?:THING|UD)|UB(?:MED)?)?|[MN]|O(?:ACH|DES|FFEE|L(?:LEG|OGN)E|M(?:CAST|M(?:BANK|UNITY)|P(?:A(?:NY|RE)|UTER)|SEC)?|N(?:DOS|S(?:TRUCTION|ULTING)|T(?:ACT|RACTORS))|O(?:KING(?:CHANNEL)?|[LP])|RSICA|U(?:NTRY|PONS?|RSES))?|PA|R(?:EDIT(?:CARD|UNION)?|ICKET|OWN|S|UISES?)?|SC|U(?:ISINELLA)?|[V-X]|Y(?:MRU|OU)?|Z)|D(?:A(?:BUR|D|NCE|T(?:[AE]|ING|SUN)|Y)|CLK|DS|E(?:AL(?:ER|S)?|GREE|L(?:IVERY|L|OITTE|TA)|MOCRAT|NT(?:AL|IST)|SI(?:GN)?|V)?|HL|I(?:AMONDS|ET|GITAL|RECT(?:ORY)?|S(?:CO(?:UNT|VER)|H)|Y)|[JKM]|NP|O(?:C(?:S|TOR)|G|MAINS|T|WNLOAD)?|RIVE|TV|U(?:BAI|NLOP|PONT|RBAN)|V(?:AG|R)|Z)|E(?:A(?:RTH|T)|CO?|D(?:EKA|U(?:CATION)?)|[EG]|M(?:AIL|ERCK)|N(?:ERGY|GINEER(?:ING)?|TERPRISES)|PSON|QUIPMENT|R(?:ICSSON|NI)?|S(?:Q|TATE)?|T(?:ISALAT)?|U(?:ROVISION|S)?|VENTS|X(?:CHANGE|P(?:ERT|OSED|RESS)|TRASPACE))|F(?:A(?:GE|I(?:L|RWINDS|TH)|MILY|NS?|RM(?:ERS)?|S(?:HION|T))|E(?:DEX|EDBACK|RR(?:ARI|ERO))|I(?:AT|D(?:ELITY|O)|LM|NA(?:L|NC(?:E|IAL))|R(?:E(?:STONE)?|MDALE)|SH(?:ING)?|T(?:NESS)?)?|[JK]|L(?:I(?:CKR|GHTS|R)|O(?:RIST|WERS)|Y)|M|O(?:O(?:D(?:NETWORK)?|TBALL)?|R(?:D|EX|SALE|UM)|UNDATION|X)?|R(?:E(?:E|SENIUS)|L|O(?:GANS|NT(?:DOO|IE)R))?|TR|U(?:JITSU|ND?|RNITURE|TBOL)|YI)|G(?:A(?:L(?:L(?:ERY|O|UP))?|MES?|P|RDEN|Y)?|B(?:IZ)?|DN?|E(?:A|NT(?:ING)?|ORGE)?|F|G(?:EE)?|H|I(?:FTS?|V(?:ES|ING))?|L(?:ASS|E|OB(?:AL|O))?|M(?:AIL|BH|[OX])?|N|O(?:DADDY|L(?:D(?:POINT)?|F)|O(?:DYEAR|G(?:LE)?)?|[PTV])|[PQ]|R(?:A(?:INGER|PHICS|TIS)|EEN|IPE|O(?:CERY|UP))?|[ST]|U(?:ARDIAN|CCI|GE|I(?:DE|TARS)|RU)?|[WY])|H(?:A(?:IR|MBURG|NGOUT|US)|BO|DFC(?:BANK)?|E(?:ALTH(?:CARE)?|L(?:P|SINKI)|R(?:E|MES))|GTV|I(?:PHOP|SAMITSU|TACHI|V)|KT?|[MN]|O(?:CKEY|L(?:DINGS|IDAY)|ME(?:DEPOT|GOODS|S(?:ENSE)?)|NDA|RSE|S(?:PITAL|T(?:ING)?)|T(?:EL(?:E)?S|MAIL)?|USE|W)|R|SBC|T|U(?:GHES)?|Y(?:ATT|UNDAI))|I(?:BM|C(?:BC|[EU])|D|E(?:EE)?|FM|KANO|L|M(?:AMAT|DB|MO(?:BILIEN)?)?|N(?:C|DUSTRIES|F(?:INITI|O)|[GK]|S(?:TITUTE|UR(?:ANC)?E)|T(?:ERNATIONAL|UIT)?|VESTMENTS)?|O|PIRANGA|Q|R(?:ISH)?|S(?:MAILI|T(?:ANBUL)?)?|T(?:AU|V)?)|J(?:A(?:GUAR|VA)|CB|E(?:EP|TZT|WELRY)?|IO|LL|MP?|NJ|O(?:B(?:S|URG)|[TY])?|P(?:MORGAN|RS)?|U(?:EGOS|NIPER))|K(?:AUFEN|DDI|E(?:RRY(?:HOTEL|LOGISTIC|PROPERTIE)S)?|FH|[GH]|I(?:[AM]|ND(?:ER|LE)|TCHEN|WI)?|[MN]|O(?:ELN|MATSU|SHER)|P(?:MG|N)?|R(?:D|ED)?|UOKGROUP|W|Y(?:OTO)?|Z)|L(?:A(?:CAIXA|M(?:BORGHINI|ER)|N(?:C(?:ASTER|IA)|D(?:ROVER)?|XESS)|SALLE|T(?:INO|ROBE)?|W(?:YER)?)?|[BC]|DS|E(?:ASE|CLERC|FRAK|G(?:AL|O)|XUS)|GBT|I(?:DL|FE(?:INSURANCE|STYLE)?|GHTING|KE|LLY|M(?:ITED|O)|N(?:COLN|DE|K)|PSY|V(?:E|ING))?|K|L[CP]|O(?:ANS?|C(?:KER|US)|FT|L|NDON|TT[EO]|VE)|PL(?:FINANCIAL)?|[RS]|T(?:DA?)?|U(?:NDBECK|X(?:E|URY))?|[VY])|M(?:A(?:CYS|DRID|I(?:F|SON)|KEUP|N(?:AGEMENT|GO)?|P|R(?:KET(?:ING|S)?|RIOTT|SHALLS)|SERATI|TTEL)?|BA|C(?:KINSEY)?|D|E(?:D(?:IA)?|ET|LBOURNE|M(?:E|ORIAL)|NU?|RCKMSD)?|[GH]|I(?:AMI|CROSOFT|L|N[IT]|T(?:SUBISHI)?)|K|L[BS]?|MA?|N|O(?:BI(?:LE)?|DA|[EIM]|N(?:ASH|EY|STER)|R(?:MON|TGAGE)|SCOW|TO(?:RCYCLES)?|V(?:IE)?)?|[P-R]|SD?|T[NR]?|U(?:S(?:EUM|IC)|TUAL)?|[V-Z])|N(?:A(?:B|GOYA|ME|TURA|VY)?|BA|C|E(?:C|T(?:BANK|FLIX|WORK)?|USTAR|WS?|X(?:T(?:DIRECT)?|US))?|FL?|GO?|HK|I(?:CO|K(?:E|ON)|NJA|SSA[NY])?|L|O(?:KIA|RT(?:HWESTERNMUTUAL|ON)|W(?:RUZ|TV)?)?|P|R[AW]?|TT|U|YC|Z)|O(?:B(?:I|SERVER)|FFICE|KINAWA|L(?:AYAN(?:GROUP)?|DNAVY|LO)|M(?:EGA)?|N(?:[EG]|L(?:INE)?)|OO|PEN|R(?:A(?:CL|NG)E|G(?:ANIC)?|IGINS)|SAKA|T(?:SUKA|T)|VH)|P(?:A(?:GE|NASONIC|R(?:IS|S|T(?:NERS|[SY]))|SSAGENS|Y)?|CCW|ET?|F(?:IZER)?|G|H(?:ARMACY|D|ILIPS|O(?:NE|TO(?:GRAPHY|S)?)|YSIO)?|I(?:C(?:S|T(?:ET|URES))|D|N[GK]?|ONEER|ZZA)|K|L(?:A(?:CE|Y(?:STATION)?)|U(?:MBING|S))?|M|NC?|O(?:HL|KER|LITIE|RN|ST)|R(?:A(?:MERICA|XI)|ESS|IME|O(?:D(?:UCTIONS)?|F|GRESSIVE|MO|PERT(?:IES|Y)|TECTION)?|U(?:DENTIAL)?)?|[ST]|UB|WC?|Y)|Q(?:A|PON|UE(?:BEC|ST))|R(?:A(?:CING|DIO)|E(?:A(?:D|L(?:ESTATE|T(?:OR|Y)))|CIPES|D(?:STONE|UMBRELLA)?|HAB|I(?:SEN?|T)|LIANCE|N(?:T(?:ALS)?)?|P(?:AIR|ORT|UBLICAN)|ST(?:AURANT)?|VIEWS?|XROTH)?|I(?:C(?:H(?:ARDLI)?|OH)|[LOP])|O(?:C(?:HER|KS)|DEO|GERS|OM)?|S(?:VP)?|U(?:GBY|HR|N)?|WE?|YUKYU)|S(?:A(?:ARLAND|FE(?:TY)?|KURA|L(?:E|ON)|MS(?:CLUB|UNG)|N(?:DVIK(?:COROMANT)?|OFI)|P|RL|S|VE|XO)?|B[IS]?|C(?:[AB]|H(?:AEFFLER|MIDT|O(?:LARSHIPS|OL)|ULE|WARZ)|IENCE|OT)?|D|E(?:A(?:RCH|T)|CUR(?:E|ITY)|EK|LECT|NER|RVICES|S|VEN|W|XY?)?|FR|G|H(?:A(?:NGRILA|RP|W)|ELL|I(?:KSH)?A|O(?:ES|P(?:PING)?|UJI|W(?:TIME)?))?|I(?:LK|N(?:A|GLES)|TE)?|J|K(?:IN?|Y(?:PE)?)?|L(?:ING)?|M(?:ART|ILE)?|N(?:CF)?|O(?:C(?:CER|IAL)|FT(?:BANK|WARE)|HU|L(?:AR|UTIONS)|N[GY]|Y)?|P(?:A(?:CE)?|O(?:R)?T)|RL?|S|T(?:A(?:DA|PLES|R|TE(?:BANK|FARM))|C(?:GROUP)?|O(?:CKHOLM|R(?:AG)?E)|REAM|UD(?:IO|Y)|YLE)?|U(?:CKS|PP(?:L(?:IES|Y)|ORT)|R(?:F|GERY)|ZUKI)?|V|W(?:ATCH|ISS)|X|Y(?:DNEY|STEMS)?|Z)|T(?:A(?:B|IPEI|LK|OBAO|RGET|T(?:A(?:MOTORS|R)|TOO)|XI?)|CI?|DK?|E(?:AM|CH(?:NOLOGY)?|L|MASEK|NNIS|VA)|[FG]|H(?:D|EAT(?:ER|RE))?|I(?:AA|CKETS|ENDA|FFANY|PS|R(?:ES|OL))|J(?:MAXX|X)?|K(?:MAXX)?|L|M(?:ALL)?|N|O(?:DAY|KYO|OLS|P|RAY|SHIBA|TAL|URS|WN|Y(?:OTA|S))?|R(?:A(?:D(?:E|ING)|INING|VEL(?:CHANNEL|ERS(?:INSURANCE)?)?)|UST|V)?|T|U(?:BE|I|NES|SHU)|VS?|[WZ])|U(?:A|B(?:ANK|S)|[GK]|N(?:I(?:COM|VERSITY)|O)|OL|PS|[SYZ])|V(?:A(?:CATIONS|N(?:A|GUARD))?|C|E(?:GAS|NTURES|R(?:ISIGN|SICHERUNG)|T)?|G|I(?:AJES|DEO|G|KING|LLAS|[NP]|RGIN|S(?:A|ION)|V[AO])?|LAANDEREN|N|O(?:DKA|L(?:KSWAGEN|VO)|T(?:E|ING|O)|YAGE)|U(?:ELOS)?)|W(?:A(?:L(?:ES|MART|TER)|NG(?:GOU)?|TCH(?:ES)?)|E(?:ATHER(?:CHANNEL)?|B(?:CAM|ER|SITE)|D(?:DING)?|I(?:BO|R))|F|HOSWHO|I(?:EN|KI|LLIAMHILL|N(?:DOWS|E|NERS)?)|ME|O(?:LTERSKLUWER|ODSIDE|R(?:KS?|LD)|W)|S|T[CF])|X(?:BOX|EROX|FINITY|I(?:HUA)?N|N--(?:1(?:1B4C3D|CK2E1B|QQW23A)|2SCRJ9C|3(?:0RR7Y|BST00M|DS443G|E0B707E|HCRJ9C|PXU8K)|4(?:2C2D9A|5(?:BR(?:5CYL|J9C)|Q11C)|DBRK0CE|GBRIM)|5(?:4B7FTA0CC|5Q(?:W42G|X5D)|SU34J936BGSG|TZM5G)|6(?:FRZ82G|QQ986B3XL)|8(?:0A(?:DXHKS|O21A|QECDR1A|S(?:EHDB|WG))|Y0A063A)|9(?:0A(?:3AC|E|IS)|DBQ2A|ET52U|KRT00A)|B(?:4W605FERD|CK1B9A5DRE4C)|C(?:1AVG|2BR7G|CK(?:2B3B|WCXETD)|G4BKI|LCHC0EA0B2G2A9GCD|ZR(?:694B|S0T|U2D))|D1A(?:CJ3B|LF)|E(?:1A4C|CKVDTC9D|FVY88H)|F(?:CT429K|HBEI|IQ(?:228C5HS|64B|S8S|Z9S)|JQ720A|LW351E|PCRJ9C3D|Z(?:C2C9E2C|YS8D69UVGM))|G(?:2XX48C|CKR3F0F|ECRJ9C|K3AT1E)|H(?:2BR(?:EG3EVE|J9C(?:8C)?)|XT814E)|I(?:1B6B1A6A2E|MR513N|O0A7I)|J(?:1A(?:EF|MH)|6W193G|LQ(?:480N2RG|61U9W7B)|VR189M)|K(?:CRX77D1X4A|P(?:R(?:W13|Y57)D|UT3I))|L(?:1ACC|GBBAT1AD8J)|M(?:GB(?:9AWBF|A(?:3A(?:3EJT|4F16A)|7C0BBN0A|A(?:KC7DVF|M7A8H)|B2BD|H1A3HJKRD|I9AZGQP6J|YH7GPA)|BH1A(?:71E)?|C(?:0A9AZCG|A7DZDO|PQ6GPA1A)|ERP4A5D4AR|GU82A|I4ECEXP|PL2FH|T(?:3DHD|X2B)|X4CD0AB)|IX891F|K1BU44C|XTQ1M)|N(?:GB(?:C5AZD|E9E0A|RX)|ODE|QV7F(?:S00EMA)?|YQY26A)|O(?:3CW4H|GBPF8FL|TU796D)|P(?:1A(?:CF|I)|GBS0DH|SSY2U)|Q(?:7CE6A|9JYB4C|CKA1PMC|XA(?:6A|M))|R(?:HQV96G|OVU88B|VC1E0AM3E)|S(?:9BRJ9C|ES554G)|T(?:60B56A|CKWE|IQ49XQYJ)|UNUP4Y|V(?:ERMGENSBERAT(?:ER-CT|UNG-PW)B|HQUV|UQ861B)|W(?:4R(?:85EL8FHU5DNRA|S40L)|GB(?:H1C|L6A))|X(?:HQ521B|KC2(?:AL3HYE2A|DL3A5EE0H))|Y(?:9A3AQ|FRO4I67O|GBI2AMMX)|ZFR164B)|XX|YZ)|Y(?:A(?:CHTS|HOO|MAXUN|NDEX)|E|O(?:DOBASHI|GA|KOHAMA|U(?:TUBE)?)|T|UN)|Z(?:A(?:PPOS|RA)?|ERO|IP|M|ONE|UERICH|W))\b(?!\.\w)/?";
var output = Regex.Matches(text, pattern).Cast<Match>().Select(x => x.Value);
See the regex demo. Details:
(?i) - case insensitive matching ON
\b - a word boundary
(?:(?:http|ftp)s?://|www\.)? - an optional sequence of http or ftp followed with an optional s and then ://, or www.
[\w/?=%.-]*? - zero or more word, /, ?, =, %, . or - chars as few as possible
\. - a . char
(?:<TLD_PATTERN>) - a pattern that matches any TLD (listed in the IANA's TLD DB)
\b - a word boundary
-(?!\.\w) - fail the match if there is . and a word char immediately to the right of the current location
/? - an optional / char.

Use OR in Regex Expression

I have a regex to match the following:
somedomain.com/services/something
Basically I need to ensure that /services is present.
The regex I am using and which is working is:
\/services*
But I need to match /services OR /servicos. I tried the following:
(\/services|\/servicos)*
But this shows 24 matches?! https://regex101.com/r/jvB1lr/1
How to create this regex?

The (\/services|\/servicos)* matches 0+ occurrences of /services or /servicos, and that means it can match an empty string anywhere inside the input string.
You can group the alternatives like /(services|servicos) and remove the * quantifier, but for this case, it is much better to use a character class [oe] as the strings only differ in 1 char.
You want to use the following pattern:
/servic[eo]s
See the regex demo
To make sure you match a whole subpart, you may append (?:/|$) at the pattern end, /servic[eo]s(?:/|$).
In C#, you may use Regex.IsMatch with the pattern to see if there is a match in a string:
var isFound = Regex.IsMatch(s, #"/servic[eo]s(?:/|$)");
Note that you do not need to escape / in a .NET regex as it is not a special regex metacharacter.
Pattern details
/ - a /
servic[eo]s - services or servicos
(?:/|$) - / or end of string.

Well the * quantifier means zero or more, so that is the problem. Remove that and it should work fine:
(\/services|\/servicos)
Keep in mind that in your example, you have a typo in the URL so it will correctly not match anything as it stands.
Here is an example with the typo in the URL fixed, so it shows 1 match as expected.

First off you specify C# (really .Net is the library which holds regex not the language) in this post but regex101 in your example is set to PHP. That is providing you with invalid information such as needed to escape a forward slash / with \/ which is unnecessary in .Net regular expressions. The regex language is the same but there are different tools which behave differently and php is not like .Net regex.
Secondly the star * on the ( ) is saying that there may be nothing in the parenthesis and your match is getting null nothing matches on every word.
Thirdly one does not need to split the whole word. I would just extract the commonality in the words into a set [ ]. That will allow the "or-ness" you need to match on either services or servicos. Such as
(/servic[oe]s)
Will inform you if services are found or not. Nothing else is needed.

Regex for allowing semi colon

I have a regex for validating a string but it doesn't accept semicolons? Is it because I have to use some escape sequences? I tested my regex here and it passes i.e allows semi-colon but doesn't allow in my c# app.
EDITED I have following regex
^[A-Za-z0-9]{1}[A-Za-z.&0-9\s\\-]{0,21}$
And tried validating sar232 trading inc;

The & entity hints at the fact you have this regular expression inside some XML attribute, and that this & gets parsed as a single & symbol when the pattern is sent to the regex engine.
That means, your pattern lacks the semi-colon inside the second character class, and that is why your regex does not match the string you provided.
The solution is simple: add the semi-colon to the 2nd character class:
someattr="^[A-Za-z0-9][;A-Za-z.&0-9\s\\-]{0,21}$"
^
See the regex demo
Please also note that the {1} limiting quantifier is redundant since a [A-Za-z0-9] already matches only 1 symbol from the indicated ranges.

how to change xml tag format like <xml></xml> to <xml/>

I have a xml file. As per my requirement I need to update empty tag such as I need to change <xml></xml> to <xml/>. Is it possible to change the tags like that..
Thank you...

var xmlString="<xml></xml> <toto></toto>";
var properString=System.Text.RegularExpressions.Regex.Replace(xmlString, "<([^>]+)></[^>]+>", "<$1/>");
EDIT: explanation!
#Neil Knight has already provided, in a comment, a link to Wikipedia explaining the concept of regular expressions. The part specific to .NET is available here: .NET Framework Regular Expressions
A starting XML tag can be matched with the following regular expression: <[^>]+>. The [^>]+ part can be read as: all characters that are not ">", with at least one character (so <> is not matched but <a> is). An ending XML tag can be matched with the same kind of expression: </[^>]+> (note the slash after the first character). So the regular expression <[^>]+></[^>]+> matches empty tags such as <foo></foo> (but be careful, it also matches <foo></bar> which is not valid XML code).
What we need now is to isolate the characters between "<" and ">". For that, we use parenthesis: <([^>]+)>. This instructs the regular expression engine to capture the matched characters. Each group of parenthesis can be referred later in a replacement operation by the "$x" string (where "x" is a number: "$1" for the first matching parenthesis, "$2" for the second one, etc.).
So, with a call to Regex.Replace(xmlString, "<([^>]+)></[^>]+>", "<$1/>"), <foo></foo> will be replaced by <foo/> ("foo" characters are captured, and "$1" is replaced by them). <foo></bar> will also be replaced by <foo/>.
I hope that this explanation is enough for #Felix K. ;o)
(my English is not so good, that's why I did not provide many details)

if (someElement.innerText == string.Empty)
{
someElement.innerText = null;
}

Regular Expressions Help in C#

I have a regular expression I'm using to remove html tags and now I'm wondering if there is any way to modify it so that it could also remove links beginning with http and ending with .stm or .gif?
This is the piece of code I'm using:
string BBCSplit = Regex.Replace(BBC, #"<(.|\n)*?>", string.Empty);

The best way to figure out regular expressions is through example, trial and error.
Put your html text in this site, along with your regexp and if it turns yellow, it's matched.
If you need a tutorial on how regexp works, I found this site to be very useful.
The regexp you'll want will be something like http:.*\.stm - which means "the characters http, followed by 0 or more characters (.*) followed by the characters .stm".

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Do not match YouTube URLs with beginning double quotes - C# Regex - c#

Related

Filter URLs vs other text with dots

Use OR in Regex Expression

Regex for allowing semi colon

how to change xml tag format like <xml></xml> to <xml/>

Regular Expressions Help in C#

Categories

Resources