If not, you should follow the above steps to revise your workflow. Click “Loop Item” again and see if Octoparse can return to the list page. Click “Click Item” to open the detail page.Ĥ. Click “Loop Item” box in your workflow.ģ. Click “Go To Web Page” to open the website.Ģ. You can follow these steps to manually check if Octoparse can return to the list page once it enters the detail page.ġ. If you are new to URLs list extraction, please follow this video tutorial to learn more. Extract detail page URLs with Octoparse firstly, and then scrape data you want with the URL list. In this case, you should divide your task into two steps. It relates to application compatibility and adaptability between Octoparse and websites you want to scrape. If Octoparse still cannot work even though you click “New Tab”, incompatible issues must be considered. You should click on the “New Tab” and re- create the following steps.ĭo remember to re-create the steps following the “Click Item” because Octoparse needs to identify a new page to extract and the previous steps would not work out.Ģ) The website is not compatible with Octoparse. Click the “Click Item” and you will find an advanced option named “New Tab”. IP address which you can see at the bottom of the Octoparse application and select Extract text of the selected element. The detail page is not set up to be opened in a new tab. This issue could be divided into two different situations.ġ. In fact, it can be considered that Octoparse cannot return to the list page after scraping data from detail page. Sometimes, you might find the problem with a task where Octoparse extracts only the first item and stops to loop through other items. When Octoparse reaches the maximum times of retrying, it would stop and enter the next step.The latest version for this tutorial is available here. To avoid Octoparse from being stuck in endlessly reloading the web page, you need to set up the maximum times of retrying. Answer (1 of 2): Use Selenium, and then wait for the page to load, then grab the data off the loaded page. Set up "Maximum reload times" and interval time.You can click to add multiple conditions for Octoparse to make the judgment. As a result, once Octoparse does not detect the set XPath on the current page, it would reload the page. In this case, you need to select "Does not contain". You can also input the XPath of some certain element that would only be there when the page is loaded normally. Thus, Octoparse would retry loading the page when it detects the string in the URL/content of the current page. Select the Loop Item/Cycle Pages box in the workflow. Input a certain string like that in the textbox as the condition and select "Contains". You can enter the correct XPath into the text box of the Single element(See the screenshot below). Usually when the load fails, the web page would respond to you with a message in the URL/content of the current page to indicate what happens, like "/errors", "500 Internal Server Error" or "Too many requests". Configure "URL/content/element(XPath) contains" option and "Contain/Does not contain" option.If after clicking several elements, Octoparse still cannot locate all the elements, you can create the loop item with the selected elements first, and then modify the XPath of the loop item. We can continue to select the 4th title and you will see Octoparse finds more elements. Octoparse needs a certain condition to tell whether the page is loaded normally and retry loading the page if the load does fail. When we select the first title, only 3 similar elements are found.
Tick the "Retry when" box, then click to configure the condition.Retry setting is only available in 3 page-loading-related actions in the workflow: Go To Web Page, Click Item and Click to paginate.
In this case, Octoparse needs to retry loading the page before starting the extraction. When the web page is not loaded normally, Octoparse has problems in scraping the data from the page and even in executing next actions. Retry action is a feature provided in Octoparse for reloading the web page that you want to scrape based on some certain condition.
Because ParseHub can handle all the source code element selection and. The latest version for this tutorial is available here. If you have more questions when using Octoparse, you could join Octoparse Users.