O people, I am trying to match "address" in this page -
This part of the address portion is HTML
& lt; Tr & gt; & Lt; Td align = "right" class = "generalinfo_left" & gt; Address: & lt; / Td> & Lt; Td square = "normal experience_right" & gt; 1 S Main Street 1430 & lt; Br / & gt; & Lt; / Td> & Lt; / TR & gt; & Lt; TR & gt; & Lt; Td align = "right" class = "generalinfo_left" & gt; & Lt; / Td> & Lt; Td class = "generalinfo_right" & gt; Dayton, OH 45402 & lt; / Td> & Lt; / TR & gt; Therefore, I tried to do the following RegEx in PHP
"% address: (. *?) (?! & Lt; br / & gt;) where "s" "." The modifier is to match the new lines too but it is not working. It does not match the "Dayton, OH 45402" part. Can anyone tell me?
This is very common: if you look at your sample text, you will see the address and Dayton Between OH 45402, you have & lt; Br / & gt; << Code> & lt; Br / & gt; is found.
You should use the parser for () HTML
It seems that all your files are actually like this sample, it should work ugly regex:
% (address :) (. *? Generalinfo_right " & Gt;) (. *?) ((& Lt; br />)|(</td>)).?(generalinfo_right">)(.*?)((<br/ & gt; ) | (& Lt; / td & gt;))% s know in groups 1, 3 and 8.
However, since most of your documents are not exactly accurate, a better solution would be to parse the HTML with a proper parser.
Comments
Post a Comment