![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
![]() |

|
| Microsoft Windows xp error all errors and bugs related to Microsoft winxp error |
![]() |
|
Regex to match certain HTML attributes
|
LinkBack | Thread Tools | Display Modes |
|
|
#1 (permalink) |
|
Fixed Error!
Posts: 141
Location: Chennai
Join Date: Feb 2007
Rep Power: 2
IM:
|
I need to remove all style and class attributes in an HTML file whilst leaving all other attributes untouched. I just need the regex for this - I've written a generic filter that uses the Regex, but I just can't seem to get this one to work (I'm failing to get the regex to ignore other attributes between the tag and the style=...). Given the following HTML (which came from pasting from the trully awful MS Werd - I really couldn't invent this rubbish if I tried!): <H1 style="MARGIN: 0cm 0cm 0pt"><FONT color=#000000>blah blah<SPAN style="mso-spacerun: yes"> </SPAN></font></H1> <P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-com ffice:smarttags" /><st1:PlaceName w:st="on"><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">blah blah</SPAN></st1:PlaceName><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">I need just the Regex and the Replacement strings. It should: - remove (match) style and class attributes - work with and without quotes - note that 'Century Gothic' is wrapped with single quotes - assume the attribute quotes are "double" (or missing) - the attributes must be allowed to be in *any* order in the tag - all other attributes and tags must be left in situ I've other regexes that clean the rest of the vomit - at least ten of them! For a bonus, if anyone has the name of the idiot who created the Werd HTML engine..... I'd just love to write to his/her mother and tell her how her child is messing with people's heads |
|
|
|
|
|
|
|
|
#2 (permalink) |
|
Fixed Error!
Posts: 141
Location: Chennai
Join Date: Feb 2007
Rep Power: 2
IM:
|
<H1 style="MARGIN: 0cm 0cm 0pt"><FONT color=#000000>blah blah<SPAN style="mso-spacerun: yes"> </SPAN></font></H1> <P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-com ffice:smarttags" /><st1:PlaceName w:st="on"><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">blah blah</SPAN></st1:PlaceName><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">>; s/(<[^<>]*?)\b((class|style)=("[^"]+?"|\S+)\s*)+([^<>]*>)/$1$5/g; print; |
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|