{"id":25077,"date":"2023-02-12T14:02:45","date_gmt":"2023-02-12T08:32:45","guid":{"rendered":"http:\/\/www.pythonpool.com\/?p=25077"},"modified":"2023-02-12T14:05:07","modified_gmt":"2023-02-12T08:35:07","slug":"web-crawling-in-python","status":"publish","type":"post","link":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/","title":{"rendered":"Unleash the Power of Web Crawling with Python"},"content":{"rendered":"\n<p>Crawling is a term used to describe the process of retrieving information from websites, such as images or other resources that are not listed on a website&#8217;s home page.  robots.txt files, form data, and other metadata available on the Internet help the user in easy web crawling.<\/p>\n\n\n<div class=\"wp-block-image is-style-rounded\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-1.jpg\" alt=\"Web Crawling in Python\" class=\"wp-image-25095\" width=\"277\" height=\"232\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-1.jpg 940w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-1-300x251.jpg 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-1-768x644.jpg 768w\" sizes=\"(max-width: 277px) 100vw, 277px\" \/><\/figure><\/div>\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_74 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #990303;color:#990303\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #990303;color:#990303\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Web_Crawling_or_Web_Scraping\" >Web Crawling or Web Scraping?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Pros_of_Web_Crawling_in_Python\" >Pros of Web Crawling in Python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Simple_Methods_of_Web_Crawling\" >Simple Methods of Web Crawling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#How_to_do_web_crawling_in_python\" >How to do web crawling in python?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#How_to_use_web_crawler_on_a_website\" >How to use web crawler on a website?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#readytocode\" >#readytocode<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#readytocodewithdjango\" >#readytocodewithdjango<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Web_Crawling_in_Python_or_Javascript_what_to_choose\" >Web Crawling in Python or Javascript, what to choose ?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#FAQs\" >FAQs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#Trending_Now\" >Trending Now<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"h-web-crawling-or-web-scraping\"><span class=\"ez-toc-section\" id=\"Web_Crawling_or_Web_Scraping\"><\/span>Web Crawling or Web Scraping?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>There is a huge difference between these two terms. While web crawling, a large-scale phenomenon, is used to index data on a webpage, web scrolling works for several websites. A web crawler is a spider bot because of this reason. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-pros-of-web-crawling-in-python\"><span class=\"ez-toc-section\" id=\"Pros_of_Web_Crawling_in_Python\"><\/span>Pros of Web Crawling in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>It offers several advantages over other methods of data extraction. Crawling through websites helps us in the following ways:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To collect data about them and store it in some database or spreadsheet format.  Here, data can refer to any type of website content like their pages and links etc. <\/li>\n\n\n\n<li>To write software applications that will crawl through websites and extract information from them, store the data in database or spreadsheet format.<\/li>\n\n\n\n<li>For analysis or research purposes<\/li>\n\n\n\n<li>Find, index and retrieve data (one uses XML or JSON files to represent websites and the information they contain.)<\/li>\n\n\n\n<li>Passive observation or monitoring of data is also done. <\/li>\n\n\n\n<li> WebCrawler can be deployed remotely without requiring personnel accesses to the source site&#8217;s servers\u2014a significant advantage over other methods that require human interaction with their targets; this may be especially important when dealing with sensitive data like financial information or health records which must remain private at all times!<\/li>\n\n\n\n<li>Webcrawling also allows for more efficient caching than other methods since it does not require physical access to the source site. <\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-simple-methods-of-web-crawling\"><span class=\"ez-toc-section\" id=\"Simple_Methods_of_Web_Crawling\"><\/span>Simple Methods of Web Crawling<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Crawling can be done through the following ways :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Manually<\/strong> by a human agent who follows links between pages on a website<\/li>\n\n\n\n<li>Using an <strong>automated search engine<\/strong> like Googlebot or Bingbot which crawls web pages automatically by looking for links between them and other pages online. Web Robots are also used these days.<\/li>\n<\/ul>\n\n\n<div class=\"monsterinsights-inline-popular-posts monsterinsights-inline-popular-posts-kilo monsterinsights-popular-posts-styled\" ><div class=\"monsterinsights-inline-popular-posts-text\"><span class=\"monsterinsights-inline-popular-posts-label\" >Popular now<\/span><span class=\"monsterinsights-inline-popular-posts-border\" ><\/span><span class=\"monsterinsights-inline-popular-posts-border-2\" ><\/span><div class=\"monsterinsights-inline-popular-posts-post\"><a class=\"monsterinsights-inline-popular-posts-title\"  href=\"https:\/\/www.pythonpool.com\/fixed-typeerror-cant-compare-datetime-datetime-to-datetime-date\/\">[Fixed] typeerror can&#8217;t compare datetime.datetime to datetime.date<\/a><\/div><\/div><\/div><p><\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-do-web-crawling-in-python\"><span class=\"ez-toc-section\" id=\"How_to_do_web_crawling_in_python\"><\/span>How to do web crawling in python?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The term comes from the idea that, when a page is loaded by a browser, it sends a request to the server asking for information about that page. This request includes details about what kind of document it is and what type of data it contains.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-medium\"><img decoding=\"async\" width=\"300\" height=\"251\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-300x251.png\" alt=\"Using a Google Bot for parsing data\" class=\"wp-image-25098\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-300x251.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling-768x644.png 768w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/web-crawling.png 940w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><figcaption class=\"wp-element-caption\"><em>Using a Google Bot for parsing data<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>The server then responds with some information about each requested resource (for example, images or documents) or even all available resources (such as all pages on one website).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-use-web-crawler-on-a-website\"><span class=\"ez-toc-section\" id=\"How_to_use_web_crawler_on_a_website\"><\/span><strong>How to use web crawler on a website?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>First let us understand how a website works. <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When the user enters a website name, it means that he wishes to access information from the website. <\/li>\n\n\n\n<li>Once the correct website matches the request (using IP address mapping) the user obtains an html file. This is a raw file.<\/li>\n\n\n\n<li>The file is not in a readable format so the browser transforms it to a format which the user can interpret.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-readytocode\"><span class=\"ez-toc-section\" id=\"readytocode\"><\/span><strong>#readytocode<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>You may follow this web crawling code in python.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Include these modules while working on python friendly environment.<\/strong><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npip install requests\npip install html5lib\npip install bs4\n<\/pre><\/div>\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Next, import requests module to deal with the html on webpage. Here, we have used get() function viz part of the requests module to get data in HTML format.<\/strong><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport requests\nurl = &quot;https:\/\/pythonpool.com&quot;\nr = requests.get(url)\t\t# r variable has all the HTML code now\nhtmlTexr = r.text              #return the response in unicode format\nprint(htmlText)               #printing the ans\n<\/pre><\/div>\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Parse data through BeautifulSoup(bs4) module.<\/strong><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom bs4 import BeautifulSoup\nsoup = BeautifulSoup(r.content, 'html.parser')\nsoup = BeautifulSoup(htmlContent, 'html.parser') \/\/or store html data in a variable\n\/\/instead of parser, we can use lxml too\n<\/pre><\/div>\n\n\n<ul class=\"wp-block-list\">\n<li><strong>To obtain all the code on the webpage, use find_all() function.<\/strong><\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfor i in soup.find_all(&quot;code&quot;):\n    print(i.text)\ntitle = soup.title                                \/\/to get title of webpage\nprint(title)                    \nprint(soup.find('a'))                        \/\/get first a tag \nparas = soup.find_all('div')          \/\/to get all div tags\nprint(paras)\nfor i in paras:                                   \/\/to get separate answers(not in list format) \n    print(i)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# print(soup.find('p')&#x5B;'class'])      \/\/finding via class name along with tag name\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# print(soup.find_all(class_=&quot;code-toolbar&quot;))     \/\/finding only via class name\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nsoup.find(\u2018element\u2019).text              \/\/to get the output without the title\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfor i in paras:                                   \/\/to scrap href variable in anchor tags\n    print(i&#x5B;'href'])\n<\/pre><\/div>\n\n\n<p>In case you face errors while coding, you may check <a href=\"http:\/\/www.pythonpool.com\/gingerit\/\" target=\"_blank\" rel=\"noreferrer noopener\">Correct Grammatical Errors Using Python<\/a>. To check for warnings while coding , <a href=\"http:\/\/www.pythonpool.com\/suppress-warnings-in-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">Suppress Warnings In Python<\/a>:  will also be handy to you.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-readytocodewithdjango\"><span class=\"ez-toc-section\" id=\"readytocodewithdjango\"><\/span>#readytocodewithdjango<span class=\"ez-toc-section-end\"><\/span><\/h4>\n\n\n\n<p>Now, you might not be aware of the fact that django can also be used. for web crawling.  The prerequisites are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Updated version of Python (Python 3 will work)<\/li>\n\n\n\n<li>Updated version of Django and Scrapy both<\/li>\n<\/ul>\n\n\n\n<p>Now, install the required packages.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npip install django \npip install scrapy\n<\/pre><\/div>\n\n\n<p>Give a name to your project.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndjango-admin startproject xyz\n<\/pre><\/div>\n\n\n<p>Create a virtual environment to work along with a model. Here, x refers to the python version you have installed on your system.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npythonx -m venv .venv &amp;&amp; source .venv\/bin\/activate\npython manage.py startapp movie     \/\/\nfrom django.db import models           \/\/models is a package here\nclass Movie(models.Model):                 \/\/specify  functions, data members of class\n....\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nThis way you can begin crawling via django\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-web-crawling-in-python-or-javascript-what-to-choose\"><span class=\"ez-toc-section\" id=\"Web_Crawling_in_Python_or_Javascript_what_to_choose\"><\/span>Web Crawling in Python or Javascript, what to choose ?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Python Web Crawler<\/th><th>JavaScript Web Crawler<\/th><\/tr><\/thead><tbody><tr><td>It uses python libraries like Beautiful Soup and Scrapy.  To send and process HTTP requests from\/to server, we use python requests libraries and lxml. <\/td><td>Here,Axios is the name of the library which is used to send HTTP requests. Javascript also inculcates some highly efficient packages. Some of these packages are Puppeteer and Nightmare.<\/td><\/tr><tr><td>The syntax is relatively easy and is not at all time consuming. <\/td><td>Websites that are Javascript based can be scraped well with a Javascript Web Crawler. However, the syntax is more complex. <\/td><\/tr><tr><td>It is ood for programmers who have just commenced learning the language.<\/td><td>For people who have a strong grip on a programming language or can handle queries efficiently,  Javascript is a good option.  <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n<div class=\"monsterinsights-inline-popular-posts monsterinsights-inline-popular-posts-alpha monsterinsights-popular-posts-styled\" ><div class=\"monsterinsights-inline-popular-posts-text\"><span class=\"monsterinsights-inline-popular-posts-label\" >Trending<\/span><div class=\"monsterinsights-inline-popular-posts-post\"><a class=\"monsterinsights-inline-popular-posts-title\"  href=\"https:\/\/www.pythonpool.com\/fixed-nameerror-name-unicode-is-not-defined\/\">[Fixed] nameerror: name Unicode is not defined<\/a><\/div><\/div><\/div><p><\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-faqs\"><span class=\"ez-toc-section\" id=\"FAQs\"><\/span>FAQs<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1675410931435\"><strong class=\"schema-faq-question\">How accurate is web crawling?<\/strong> <p class=\"schema-faq-answer\">Webcrawling is more accurate than other methods because it can crawl through different versions of pages in order to find every possible version of each page on a site, whereas other methods can only crawl through one version at a time. <\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1675433641086\"><strong class=\"schema-faq-question\">Do the #big companies have pre-curated web crawlers?<\/strong> <p class=\"schema-faq-answer\">Yes, companies like Amazon and Microsoft have their web crawlers. The name of the web crawler of Amazon is Amazonbot.  Microsoft introduced Bingbot as the  web crawler for this search engine. <\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1675433836704\"><strong class=\"schema-faq-question\">Can we refer to Google as a web crawler?<\/strong> <p class=\"schema-faq-answer\">Yes,  the search index is crawler based. When we surf the net,we tend to go through several sites. <\/p> <\/div> <\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In this article we learnt about web crawling, a widely adopted technique for extracting data from websites i.e. how to do web crawling in Python and make use of Python\u2019s HTTP library for downloading data from website pages.<\/p>\n\n\n<div class=\"monsterinsights-widget-popular-posts monsterinsights-widget-popular-posts-delta monsterinsights-popular-posts-styled monsterinsights-widget-popular-posts-columns-2\"><h2 class=\"monsterinsights-widget-popular-posts-widget-title\"><span class=\"ez-toc-section\" id=\"Trending_Now\"><\/span>Trending Now<span class=\"ez-toc-section-end\"><\/span><\/h2><ul class=\"monsterinsights-widget-popular-posts-list\"><li ><a href=\"https:\/\/www.pythonpool.com\/fixed-typeerror-cant-compare-datetime-datetime-to-datetime-date\/\"><div class=\"monsterinsights-widget-popular-posts-image\"><img decoding=\"async\" src=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/typeerror-cant-compare-datetime.datetime-to-datetime.date_-300x157.webp\" srcset=\" https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/typeerror-cant-compare-datetime.datetime-to-datetime.date_-300x157.webp 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/typeerror-cant-compare-datetime.datetime-to-datetime.date_-1024x536.webp 1024w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/typeerror-cant-compare-datetime.datetime-to-datetime.date_-768x402.webp 768w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/typeerror-cant-compare-datetime.datetime-to-datetime.date_.webp 1200w \" alt=\"[Fixed] typeerror can&#8217;t compare datetime.datetime to datetime.date\" \/><\/div><div class=\"monsterinsights-widget-popular-posts-text\"><span class=\"monsterinsights-widget-popular-posts-title\" >[Fixed] typeerror can&#8217;t compare datetime.datetime to datetime.date<\/span><div class=\"monsterinsights-widget-popular-posts-meta\" ><span class=\"monsterinsights-widget-popular-posts-author\">by Namrata Gulati<\/span><span>&#9679;<\/span><span class=\"monsterinsights-widget-popular-posts-date\">January 11, 2024<\/span><\/div><\/div><\/a><\/li><li ><a href=\"https:\/\/www.pythonpool.com\/fixed-nameerror-name-unicode-is-not-defined\/\"><div class=\"monsterinsights-widget-popular-posts-image\"><img decoding=\"async\" src=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-nameerror-name-Unicode-is-not-defined-300x157.webp\" srcset=\" https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-nameerror-name-Unicode-is-not-defined-300x157.webp 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-nameerror-name-Unicode-is-not-defined-1024x536.webp 1024w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-nameerror-name-Unicode-is-not-defined-768x402.webp 768w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-nameerror-name-Unicode-is-not-defined.webp 1200w \" alt=\"[Fixed] nameerror: name Unicode is not defined\" \/><\/div><div class=\"monsterinsights-widget-popular-posts-text\"><span class=\"monsterinsights-widget-popular-posts-title\" >[Fixed] nameerror: name Unicode is not defined<\/span><div class=\"monsterinsights-widget-popular-posts-meta\" ><span class=\"monsterinsights-widget-popular-posts-author\">by Namrata Gulati<\/span><span>&#9679;<\/span><span class=\"monsterinsights-widget-popular-posts-date\">January 2, 2024<\/span><\/div><\/div><\/a><\/li><li ><a href=\"https:\/\/www.pythonpool.com\/solved-runtimeerror-cuda-error-invalid-device-ordinal\/\"><div class=\"monsterinsights-widget-popular-posts-image\"><img decoding=\"async\" src=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Solved-runtimeerror-cuda-error-invalid-device-ordinal-300x157.webp\" srcset=\" https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Solved-runtimeerror-cuda-error-invalid-device-ordinal-300x157.webp 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Solved-runtimeerror-cuda-error-invalid-device-ordinal-1024x536.webp 1024w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Solved-runtimeerror-cuda-error-invalid-device-ordinal-768x402.webp 768w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Solved-runtimeerror-cuda-error-invalid-device-ordinal.webp 1200w \" alt=\"[Solved] runtimeerror: cuda error: invalid device ordinal\" \/><\/div><div class=\"monsterinsights-widget-popular-posts-text\"><span class=\"monsterinsights-widget-popular-posts-title\" >[Solved] runtimeerror: cuda error: invalid device ordinal<\/span><div class=\"monsterinsights-widget-popular-posts-meta\" ><span class=\"monsterinsights-widget-popular-posts-author\">by Namrata Gulati<\/span><span>&#9679;<\/span><span class=\"monsterinsights-widget-popular-posts-date\">January 2, 2024<\/span><\/div><\/div><\/a><\/li><li ><a href=\"https:\/\/www.pythonpool.com\/fixed-typeerror-type-numpy-ndarray-doesnt-define-__round__-method\/\"><div class=\"monsterinsights-widget-popular-posts-image\"><img decoding=\"async\" src=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-typeerror-type-numpy.ndarray-doesnt-define-__round__-method-300x157.webp\" srcset=\" https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-typeerror-type-numpy.ndarray-doesnt-define-__round__-method-300x157.webp 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-typeerror-type-numpy.ndarray-doesnt-define-__round__-method-1024x536.webp 1024w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-typeerror-type-numpy.ndarray-doesnt-define-__round__-method-768x402.webp 768w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2024\/01\/Fixed-typeerror-type-numpy.ndarray-doesnt-define-__round__-method.webp 1200w \" alt=\"[Fixed] typeerror: type numpy.ndarray doesn&#8217;t define __round__ method\" \/><\/div><div class=\"monsterinsights-widget-popular-posts-text\"><span class=\"monsterinsights-widget-popular-posts-title\" >[Fixed] typeerror: type numpy.ndarray doesn&#8217;t define __round__ method<\/span><div class=\"monsterinsights-widget-popular-posts-meta\" ><span class=\"monsterinsights-widget-popular-posts-author\">by Namrata Gulati<\/span><span>&#9679;<\/span><span class=\"monsterinsights-widget-popular-posts-date\">January 2, 2024<\/span><\/div><\/div><\/a><\/li><\/ul><\/div><p><\/p>","protected":false},"excerpt":{"rendered":"<p>Crawling is a term used to describe the process of retrieving information from websites, such as images or other resources that are not listed on &#8230; <\/p>\n<p class=\"read-more-container\"><a title=\"Unleash the Power of Web Crawling with Python\" class=\"read-more button\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#more-25077\" aria-label=\"More on Unleash the Power of Web Crawling with Python\">Read more<\/a><\/p>\n","protected":false},"author":38,"featured_media":25521,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[15],"tags":[5626,5623,5622,5625,5624,5627],"class_list":["post-25077","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorials","tag-web-crawler-api-python","tag-web-crawler-in-python-django","tag-web-crawler-in-python-tutorial","tag-web-crawler-python-amazon","tag-web-crawling-program-in-python","tag-what-is-web-crawling-in-python","infinite-scroll-item"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v20.1 (Yoast SEO v25.0) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Unleash the Power of Web Crawling with Python - Python Pool<\/title>\n<meta name=\"description\" content=\"Do you know how to go with web crawling in python by parsing html? Find out more about web crawling on pythonpool.com\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Unleash the Power of Web Crawling with Python\" \/>\n<meta property=\"og:description\" content=\"Crawling is a term used to describe the process of retrieving information from websites, such as images or other resources that are not listed on a\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Python Pool\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-12T08:32:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-12T08:35:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Namrata Gulati\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@pythonpool\" \/>\n<meta name=\"twitter:site\" content=\"@pythonpool\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Namrata Gulati\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\"},\"author\":{\"name\":\"Namrata Gulati\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/294338f378f0853e6af4ca4a5a907ea6\"},\"headline\":\"Unleash the Power of Web Crawling with Python\",\"datePublished\":\"2023-02-12T08:32:45+00:00\",\"dateModified\":\"2023-02-12T08:35:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\"},\"wordCount\":961,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.pythonpool.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp\",\"keywords\":[\"web crawler api python\",\"web crawler in python django\",\"web crawler in python tutorial\",\"web crawler python amazon\",\"web crawling program in python\",\"what is web crawling in python\"],\"articleSection\":[\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#respond\"]}]},{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\",\"url\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\",\"name\":\"Unleash the Power of Web Crawling with Python - Python Pool\",\"isPartOf\":{\"@id\":\"https:\/\/www.pythonpool.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp\",\"datePublished\":\"2023-02-12T08:32:45+00:00\",\"dateModified\":\"2023-02-12T08:35:07+00:00\",\"description\":\"Do you know how to go with web crawling in python by parsing html? Find out more about web crawling on pythonpool.com\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#breadcrumb\"},\"mainEntity\":[{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435\"},{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086\"},{\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704\"}],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage\",\"url\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp\",\"contentUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp\",\"width\":1200,\"height\":628,\"caption\":\"Web Crawling in Python\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pythonpool.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Unleash the Power of Web Crawling with Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pythonpool.com\/#website\",\"url\":\"https:\/\/www.pythonpool.com\/\",\"name\":\"Python Pool\",\"description\":\"Your One-Stop Python Learning Destination\",\"publisher\":{\"@id\":\"https:\/\/www.pythonpool.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pythonpool.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pythonpool.com\/#organization\",\"name\":\"Python Pool\",\"url\":\"https:\/\/www.pythonpool.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/\",\"url\":\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png\",\"contentUrl\":\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png\",\"width\":452,\"height\":185,\"caption\":\"Python Pool\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/pythonpool\",\"https:\/\/www.youtube.com\/c\/pythonpool\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/294338f378f0853e6af4ca4a5a907ea6\",\"name\":\"Namrata Gulati\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/19c5e6bfbc6202d4017b79f726b2ad5e520491d67ff428a87c071afef23ecd89?s=96&d=wavatar&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/19c5e6bfbc6202d4017b79f726b2ad5e520491d67ff428a87c071afef23ecd89?s=96&d=wavatar&r=g\",\"caption\":\"Namrata Gulati\"}},{\"@type\":\"Question\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435\",\"position\":1,\"url\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435\",\"name\":\"How accurate is web crawling?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Webcrawling is more accurate than other methods because it can crawl through different versions of pages in order to find every possible version of each page on a site, whereas other methods can only crawl through one version at a time. \",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086\",\"position\":2,\"url\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086\",\"name\":\"Do the #big companies have pre-curated web crawlers?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Yes, companies like Amazon and Microsoft have their web crawlers. The name of the web crawler of Amazon is Amazonbot.  Microsoft introduced Bingbot as the  web crawler for this search engine. \",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704\",\"position\":3,\"url\":\"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704\",\"name\":\"Can we refer to Google as a web crawler?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Yes,  the search index is crawler based. When we surf the net,we tend to go through several sites. \",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Unleash the Power of Web Crawling with Python - Python Pool","description":"Do you know how to go with web crawling in python by parsing html? Find out more about web crawling on pythonpool.com","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/","og_locale":"en_US","og_type":"article","og_title":"Unleash the Power of Web Crawling with Python","og_description":"Crawling is a term used to describe the process of retrieving information from websites, such as images or other resources that are not listed on a","og_url":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/","og_site_name":"Python Pool","article_published_time":"2023-02-12T08:32:45+00:00","article_modified_time":"2023-02-12T08:35:07+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp","type":"image\/webp"}],"author":"Namrata Gulati","twitter_card":"summary_large_image","twitter_creator":"@pythonpool","twitter_site":"@pythonpool","twitter_misc":{"Written by":"Namrata Gulati","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#article","isPartOf":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/"},"author":{"name":"Namrata Gulati","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/294338f378f0853e6af4ca4a5a907ea6"},"headline":"Unleash the Power of Web Crawling with Python","datePublished":"2023-02-12T08:32:45+00:00","dateModified":"2023-02-12T08:35:07+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/"},"wordCount":961,"commentCount":0,"publisher":{"@id":"https:\/\/www.pythonpool.com\/#organization"},"image":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp","keywords":["web crawler api python","web crawler in python django","web crawler in python tutorial","web crawler python amazon","web crawling program in python","what is web crawling in python"],"articleSection":["Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.pythonpool.com\/web-crawling-in-python\/#respond"]}]},{"@type":["WebPage","FAQPage"],"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/","url":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/","name":"Unleash the Power of Web Crawling with Python - Python Pool","isPartOf":{"@id":"https:\/\/www.pythonpool.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage"},"image":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp","datePublished":"2023-02-12T08:32:45+00:00","dateModified":"2023-02-12T08:35:07+00:00","description":"Do you know how to go with web crawling in python by parsing html? Find out more about web crawling on pythonpool.com","breadcrumb":{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#breadcrumb"},"mainEntity":[{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435"},{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086"},{"@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704"}],"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pythonpool.com\/web-crawling-in-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#primaryimage","url":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp","contentUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2023\/02\/Web-Crawling-in-Python.webp","width":1200,"height":628,"caption":"Web Crawling in Python"},{"@type":"BreadcrumbList","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pythonpool.com\/"},{"@type":"ListItem","position":2,"name":"Unleash the Power of Web Crawling with Python"}]},{"@type":"WebSite","@id":"https:\/\/www.pythonpool.com\/#website","url":"https:\/\/www.pythonpool.com\/","name":"Python Pool","description":"Your One-Stop Python Learning Destination","publisher":{"@id":"https:\/\/www.pythonpool.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pythonpool.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.pythonpool.com\/#organization","name":"Python Pool","url":"https:\/\/www.pythonpool.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/","url":"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png","contentUrl":"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png","width":452,"height":185,"caption":"Python Pool"},"image":{"@id":"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/pythonpool","https:\/\/www.youtube.com\/c\/pythonpool"]},{"@type":"Person","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/294338f378f0853e6af4ca4a5a907ea6","name":"Namrata Gulati","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/19c5e6bfbc6202d4017b79f726b2ad5e520491d67ff428a87c071afef23ecd89?s=96&d=wavatar&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/19c5e6bfbc6202d4017b79f726b2ad5e520491d67ff428a87c071afef23ecd89?s=96&d=wavatar&r=g","caption":"Namrata Gulati"}},{"@type":"Question","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435","position":1,"url":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675410931435","name":"How accurate is web crawling?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Webcrawling is more accurate than other methods because it can crawl through different versions of pages in order to find every possible version of each page on a site, whereas other methods can only crawl through one version at a time. ","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086","position":2,"url":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433641086","name":"Do the #big companies have pre-curated web crawlers?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Yes, companies like Amazon and Microsoft have their web crawlers. The name of the web crawler of Amazon is Amazonbot.  Microsoft introduced Bingbot as the  web crawler for this search engine. ","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704","position":3,"url":"https:\/\/www.pythonpool.com\/web-crawling-in-python\/#faq-question-1675433836704","name":"Can we refer to Google as a web crawler?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Yes,  the search index is crawler based. When we surf the net,we tend to go through several sites. ","inLanguage":"en-US"},"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/25077","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/comments?post=25077"}],"version-history":[{"count":40,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/25077\/revisions"}],"predecessor-version":[{"id":25522,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/25077\/revisions\/25522"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/media\/25521"}],"wp:attachment":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/media?parent=25077"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/categories?post=25077"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/tags?post=25077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}