php - How can I make a function that repeats itself until it finds ALL the information? -


I want to create a PHP function that goes through the homepage of the website, gets all the links in the homepage, links Goes through the medium and finds it and the last thing to keep on until it is said on all websites is final. I really need to make something like this, so I can spider the network of my network and supply "one stop" for search.

Even so far away from me -

  function spider ($ urltospider, $ current_array = array (), $ ignore_array = array ('')) {If (empty ($ current_array)) // // Original URL Request $ session = curl_init ($ urltospider); Curl_setopt ($ session, CURLOPT_RETURNTRANSFER, true); $ Html = curl_xac ($ session); Curl_close ($ session); If ($ html! = '') {$ Dom = newdomain (); @ $ Dom- & gt; LoadHTML ($ HTML); $ Xpath = new DOMXPath ($ dom); $ Hrefs = $ xpath- & gt; Rate ("/ html / body / a"); ($ I = 0; $ i & lt; $ hrefs- & gt; Length; $ i ++) for {$ href = $ hrefs- & gt; Item ($ i); $ Url = $ href- & gt; GetAttribute ('href'); If (in_array ($ url, $ ignore_array) & amp;! & Amp;! In_array ($ url, $ current_array)) // // Current spider array $ current_array [] = $ Add this URL to url; }}} And ('failed connection to url');}} Else {// already in the current array foreach are url ($ current_array $ url) {// Connect to this URL // Get all the links in this url // Go through each URL Get more links}}}  

The only problem is, how do I see my head not moving forward Could someone help me? Actually, these functions will repeat itself until all Something has not been found.

I'm not a PHP expert, but you start to get over-complicated ($ Urltospider, $ current_array = array), $ ignore_array = array ('')) {if (empty ($ current_array)) {$ current_re [] = $ urltospider ; $ Cur_crawl = 0; while ($ cur_crawl & lt; len ($ current_array)) do not use {// foreach as you can mess it when it's inside the loop. $ Links_found = crawl ($ current_array ($ cur_crawl) )); // Return to the crawl all links given in the given page // Keep adding $ links_found $ current_array Now you can check that any link is already in $ current_array so you do not crawl them multiple times $ current_array = array_merge ($ current_array, $ links_found); $ Cur_crawl + = 1; } Return $ current_array; }


Comments