Crawling without BeautifulSoup
Crawling pt.3
Today github blog’s topic is about crawling without BeautifulSoup module
. Generally, when using crawling technique, I use BeautifulSoup
module. Unfortunately, most websites has the informational security problems of the crawling, so the crawling is blocked. To avoid the block of the website, use crawling technique of the dynamic web page.
Use
F12
key on your keyboard and findthe referer webpage address and user agent
on the network panel.Write python codes of the crawling the following:
1 | url = '[web address to crawl]' |
You can access the dynamic webpage that is blocked by java script(js).