Error » Microsoft Error! » Microsoft Operating Systems Error » Microsoft Windows xp error » Regex to match certain HTML attributes

Microsoft Windows xp error all errors and bugs related to Microsoft winxp error

Post New Thread Reply
  Regex to match certain HTML attributes
LinkBack Thread Tools Display Modes
Old 27-Feb-2007, 03:29 AM   #1 (permalink)
Fixed Error!
 
Sangeetha's Avatar

Posts: 141
Location: Chennai
Join Date: Feb 2007
Rep Power: 2 Sangeetha is on a distinguished road

IM:
Default Regex to match certain HTML attributes

Regexes are sometimes quite challenging. I've been banging my head on this table for hours now and want to stop. Please help me get rid of this headache!

I need to remove all style and class attributes in an HTML file whilst leaving all other attributes untouched. I just need the regex for this - I've written a generic filter that uses the Regex, but I just can't seem to get this one to work (I'm failing to get the regex to ignore other attributes between the tag and the style=...).

Given the following HTML (which came from pasting from the trully awful MS Werd - I really couldn't invent this rubbish if I tried!):

<H1 style="MARGIN: 0cm 0cm 0pt"><FONT color=#000000>blah blah<SPAN style="mso-spacerun: yes">&nbsp; </SPAN></font></H1>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-comffice:smarttags" /><st1:PlaceName w:st="on"><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">blah blah</SPAN></st1:PlaceName><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">

I need just the Regex and the Replacement strings. It should:
- remove (match) style and class attributes
- work with and without quotes - note that 'Century Gothic' is wrapped with single quotes
- assume the attribute quotes are "double" (or missing)
- the attributes must be allowed to be in *any* order in the tag
- all other attributes and tags must be left in situ

I've other regexes that clean the rest of the vomit - at least ten of them!

For a bonus, if anyone has the name of the idiot who created the Werd HTML engine..... I'd just love to write to his/her mother and tell her how her child is messing with people's heads
Sangeetha is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Old 27-Feb-2007, 03:29 AM   #2 (permalink)
Fixed Error!
 
Sangeetha's Avatar

Posts: 141
Location: Chennai
Join Date: Feb 2007
Rep Power: 2 Sangeetha is on a distinguished road

IM:
Default Re: Regex to match certain HTML attributes

$_=q<
<H1 style="MARGIN: 0cm 0cm 0pt"><FONT color=#000000>blah blah<SPAN style="mso-spacerun: yes">&nbsp; </SPAN></font></H1>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-comffice:smarttags" /><st1:PlaceName w:st="on"><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">blah blah</SPAN></st1:PlaceName><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">
>;

s/(<[^<>]*?)\b((class|style)=("[^"]+?"|\S+)\s*)+([^<>]*>)/$1$5/g;

print;
Sangeetha is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -8. The time now is 05:30 PM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227