{"id":57594,"date":"2017-07-08T19:35:11","date_gmt":"2017-07-08T17:35:11","guid":{"rendered":"https:\/\/aulatina.com\/?p=18534"},"modified":"2025-09-29T12:44:18","modified_gmt":"2025-09-29T10:44:18","slug":"what-is-web-scraping","status":"publish","type":"post","link":"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/","title":{"rendered":"What is Web Scraping?"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Contents\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #303030;color:#303030\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #303030;color:#303030\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/#Introduccion\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/#Herramientas_de_web_scraping\" >Web scraping tools<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/#Web_scraping_legitimo\" >Legitimate web scraping<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/#Web_scraping_malicioso\" >Malicious web scraping<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.aulatina.com\/en\/que-es-web-scraping\/#Aumente_sus_ventas_con_web_scraping\" >Increase your sales with web scraping<\/a><\/li><\/ul><\/nav><\/div>\n<div class=\"wpb-content-wrapper\"><div class=\"vc_row wpb_row vc_row-fluid justify vc_custom_1669290739427\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"Introduccion\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Web Scraping is the process of using bots to extract content and data from a web page, let's try to answer the question What is Web Scraping?<\/p>\n<p>The web scraping technique copies the extracts underlying the HTML code. And, with it, the data stored in a database. Web scraping can copy or replicate the entire content of the website elsewhere.<\/p>\n<p>Web scraping is used in a variety of digital businesses that rely on data collection. A priori it does not have to be a bad thing or something to try to avoid. It can help us to expand our online business. There are a variety of examples:<\/p>\n<p>Search engine bots that crawl a website, analysing its content and then ranking it. For example, the Google bot. We want it to crawl our website so that it can index it, especially if we have optimised it for SEO.<\/p>\n<p>The famous price comparators, which implement bots to automatically obtain prices and product descriptions for affiliated sellers' websites. For example, typical price comparison portals for hotels, insurance, etc.<\/p>\n<p>Market research companies. Use bots to extract data from forums and social media (e.g. for social analysis or usage habits).<\/p>\n<p>However, web scraping is also used for illegal purposes, such as stealing copyrighted content or spying on competitors.<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div id=\"separacion-h2\" class=\"vc_row wpb_row vc_row-fluid separacion-h2 vc_custom_1668948324779\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2 class=\"h2A\">What is Web Scraping?<\/h2>\n<h2 class=\"h2B\">What is Web Scraping?<\/h2>\n<div class=\"h2separador\"><\/div>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div id=\"table-of-contents-1\" class=\"vc_row wpb_row vc_row-fluid justify vc_custom_1669290778910\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"Herramientas_de_web_scraping\"><\/span>Web scraping tools<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Web scraping tools are software (i.e. bots) programmed to filter databases and extract information. A variety of different types of bots are used, where many of them are fully customisable to: Recognise <a href=\"https:\/\/www.aulatina.com\/en\/structuring-your-site-using-divs-and-spans-in-html\/\">HTML site structures<\/a> The data collected is used to extract, extract and transform the content of a website, store the collected data, and extract data from APIs.<\/p>\n<p>Since all scraping bots have the same purpose - to access website data - it can be difficult to distinguish between legitimate and malicious bots. But there are some key differences that help distinguish between the two types:<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid separacion-h2 vc_custom_1668948324779\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2 class=\"h2A\">What is Web Scraping?<\/h2>\n<h2 class=\"h2B\">What is Web Scraping?<\/h2>\n<div class=\"h2separador\"><\/div>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div id=\"table-of-contents-2\" class=\"vc_row wpb_row vc_row-fluid justify vc_custom_1669290815667\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"Web_scraping_legitimo\"><\/span>Legitimate web scraping<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Legitimate bots identify themselves with the entity they are scraping for. For example, Googlebot identifies itself in its HTTP header as belonging to Google.<\/p>\n<p>Legitimate bots respect a site's robot.txt file. This lists those pages that a bot is allowed to access and those that it is not.<\/p>\n<p>The resources required to run scraping bots are substantial. So much so that legitimate web scraping entities invest heavily in servers to process the large amount of data they extract.<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid separacion-h2 vc_custom_1668948324779\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2 class=\"h2A\">What is Web Scraping?<\/h2>\n<h2 class=\"h2B\">What is Web Scraping?<\/h2>\n<div class=\"h2separador\"><\/div>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div id=\"table-of-contents-3\" class=\"vc_row wpb_row vc_row-fluid justify vc_custom_1669290881825\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"Web_scraping_malicioso\"><\/span>Malicious web scraping<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Web scraping is considered malicious when data is extracted without the website owners' permission. Malicious bots spoof legitimate traffic by creating a fake HTTP user agent. In addition, they crawl the website regardless of what the website administrator has allowed.<\/p>\n<p>The two most common use cases for malicious web scraping are the\u00a0<em>price scraping<\/em>\u00a0and the\u00a0<em>content theft<\/em>.<\/p>\n<p>In the\u00a0<em>price scraping<\/em>usually a botnet is used. From this network, crawler bots are launched to inspect the databases of competing businesses. The aim is to access, above all, price information.<\/p>\n<p>Attacks frequently occur in companies where products are easily comparable and price plays an important role in the purchasing decisions of consumer users.<\/p>\n<p>Victims of price scraping can be travel agencies, ticket sellers and online e-tailers. That is, to gain an advantage over their competitors. A supplier can use a bot to continuously extract its competitors' websites and instantly update its own prices.<\/p>\n<p>The\u00a0<em>content scraping<\/em>\u00a0involves large-scale content theft from a given website. Typical targets are online product catalogues and websites that rely on digital content to drive their business. For example, local online business directories invest significant amounts of time, money and energy to build their content database. Scraping can result in all content being harvested, and used in spam campaigns or resold to competitors.<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid vc_custom_1657102698850\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div class=\"vc_message_box vc_message_box-standard vc_message_box-rounded vc_color-info vc_custom_1657103204748\" ><div class=\"vc_message_box-icon\"><i class=\"fa fa-solid fa-circle-info\"><\/i><\/div><p>We provide services throughout the national territory, and our headquarters are located in the centre of <a href=\"https:\/\/www.google.com\/maps\/place\/Aulatina,+Dise%C3%B1o+Web+M%C3%A1laga\/@36.7194334,-4.422429,15z\/data=!4m2!3m1!1s0x0:0x9a601ad8d190d429?sa=X&amp;ved=2ahUKEwjG5Mqw3qL2AhWNSvEDHWQvBdQQ_BJ6BAgbEAU\" target=\"_blank\" rel=\"noopener\">Malaga<\/a>You can contact us by <a href=\"https:\/\/api.whatsapp.com\/send?phone=34695551758&amp;text=Presupuesto\" target=\"_blank\" rel=\"noopener\">Whatsapp 1<\/a> or to the <a href=\"https:\/\/api.whatsapp.com\/send?phone=34665552790&amp;text=Presupuesto\" target=\"_blank\" rel=\"noopener\">Whatsapp 2<\/a>or by calling us on one of these two mobiles (+34) 695 551 758 or (+34) 695 552 790 or from our contact form at <a href=\"\/en\/contact\/\">contact<\/a>.<\/p>\n<\/div><\/div><\/div><\/div><\/div><div id=\"table-of-contents-4\" class=\"vc_row wpb_row vc_row-fluid justify vc_custom_1669290930289\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"Aumente_sus_ventas_con_web_scraping\"><\/span>Increase your sales with web scraping<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Having seen the two types of scraping (legitimate and malicious), let's focus again on the legitimate ones. Let's take an example, suppose you have an online shop and you want to connect to Google Merchant Center or solostock.com. With this technique you will be able to publish your products on those websites by simply taking over your original one.<\/p>\n<p>The others will automatically be updated as you update yours. And you won't need to spend more time and effort on the others.<\/p>\n<p>Therefore, from Aulatina, we can work legitimate web scraping on your website so that you can increase your sales and visibility on the Internet.<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div data-vc-full-width=\"true\" data-vc-full-width-temp=\"true\" data-vc-full-width-init=\"false\" class=\"vc_row wpb_row vc_row-fluid vc_custom_1669290953941 vc_row-has-fill vc_row-o-content-middle vc_row-flex\"><div class=\"wpb_column vc_column_container vc_col-sm-6\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div id=\"ultimate-heading-20486a2fbe8c52d09\" class=\"uvc-heading ult-adjust-bottom-margin ultimate-heading-20486a2fbe8c52d09 uvc-2665\" data-hspacer=\"no_spacer\"  data-halign=\"left\" style=\"text-align:left\"><div class=\"uvc-heading-spacer no_spacer\" style=\"top\"><\/div><div class=\"uvc-main-heading ult-responsive\"  data-ultimate-target='.uvc-heading.ultimate-heading-20486a2fbe8c52d09 h4'  data-responsive-json-new='{\"font-size\":\"desktop:45px;tablet_portrait:34px;\",\"line-height\":\"desktop:50px;tablet_portrait:44px;\"}' ><h4 style=\"--font-weight:theme;color:#ffffff;\">Let's work together<\/h4><\/div><div class=\"uvc-sub-heading ult-responsive\"  data-ultimate-target='.uvc-heading.ultimate-heading-20486a2fbe8c52d09 .uvc-sub-heading '  data-responsive-json-new='{\"font-size\":\"desktop:25px;tablet_portrait:20px;\",\"line-height\":\"desktop:30px;tablet_portrait:30px;\"}'  style=\"font-weight:bold;color:#ffffff;\">We would love to hear about your next project and show you how we can help.<\/div><\/div><\/div><\/div><\/div><div class=\"wpb_column vc_column_container vc_col-sm-6\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><style type=\"text\/css\" data-type=\"the7_shortcodes-inline-css\">#default-btn-fe066e1174e3a4e558f04bc44e1e27ce {\n  border-radius: 0px;\n  font-size: 24px;\n  line-height: 26px;\n}\n.btn-shadow#default-btn-fe066e1174e3a4e558f04bc44e1e27ce {\n  box-shadow: 0 1px 6px rgba(0,0,0,0.12);\n  transition: box-shadow 0.2s ease-out, opacity 0.45s;\n}\n.btn-shadow#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:hover {\n  box-shadow: 0 5px 11px 0 rgba(0,0,0,0.18), 0 4px 15px 0 rgba(0,0,0,0.15);\n}\n.btn-3d#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:hover {\n  box-shadow: 0px 2px 0px 0px #e0e0e0;\n}\n.btn-flat#default-btn-fe066e1174e3a4e558f04bc44e1e27ce {\n  box-shadow: none;\n}\n.btn-flat#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:hover {\n  box-shadow: none;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce.ico-right-side > i {\n  margin-right: 0px;\n  margin-left: 8px;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce > i {\n  margin-right: 8px;\n  font-size: 24px;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:not(:hover) {\n  border-width: 0px;\n  color: #ffffff;\n  padding: 26px 10px 26px 10px;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:not(:hover) * {\n  color: #ffffff;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:hover {\n  border-width: 0px;\n  color: #000000;\n  background: #ffffff !important;\n  padding: 26px 10px 26px 10px;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce:hover * {\n  color: #000000;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce.ico-right-side > i {\n  margin-right: 0px;\n  margin-left: 8px;\n}\n#default-btn-fe066e1174e3a4e558f04bc44e1e27ce > i {\n  margin-right: 8px;\n}<\/style><div class=\"btn-align-center\"><a href=\"\/en\/contact\/\" class=\"default-btn-shortcode dt-btn link-hover-off full-width-btn btn-flat\" id=\"default-btn-fe066e1174e3a4e558f04bc44e1e27ce\" title=\"Mobile application development\"><i class=\"icomoon-the7-font-the7-arrow-29\"><\/i><span>Ask us for a quote<\/span><\/a><\/div><\/div><\/div><\/div><\/div><div class=\"vc_row-full-width vc_clearfix\"><\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Web Scraping is the process of extracting content and data from web pages using bots. Find out what it is and how this information extraction technique is applied.<\/p>","protected":false},"author":1,"featured_media":57597,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[1],"tags":[],"class_list":["post-57594","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","category-1","description-off"],"_links":{"self":[{"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/posts\/57594","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/comments?post=57594"}],"version-history":[{"count":0,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/posts\/57594\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/media\/57597"}],"wp:attachment":[{"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/media?parent=57594"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/categories?post=57594"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aulatina.com\/en\/wp-json\/wp\/v2\/tags?post=57594"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}