linux - Compare two websites and see if they are "equal?" -


We are migrating web servers, and it will be good to see some basic site structure automatically If the provided pages are similar to the old server on the new server. I was just wondering if anyone knew anything to help in this work?

formatted output (here we w3m , But links can also work):

  w3m -dump http: //google.com 2> / Dev / null & gt; /tmp/1.html w3m -dump http://google.de2> gt; / Dev / null & gt; Use /tmp/2.html  

Then use it, you can tell a percentage of two texts. It can be easy to see differences by using

  wdiff -nis / tmp / 1.html /tmp/2.html  

.

  wdiff -nis /tmp/1.html /tmp/2.html | Output quote:   
  Web Images Video Map [-Versical-] Livrease {+ Traduation +} Gmail Plus »[-iGoogle | -] Paramartes | Connexion Google [hp1] [hp2] [hpi] [-franacas-] {+ dutchland +} [] rehearts avantiouits [reaches Google] [J'ai de la chance] languistics / tmp1 / 1.html: 43 word 39 90% common 3 6% deleted 1 2% changed / TTP / 2/9: 49 words 39 79% common 9 18% inserted 1 2% revenge  

(she actually in French

General% values ​​are similarities between both texts, except that you can easily see the difference based on the word (instead of the line). May be disorganized)


Comments