Crawling without BeautifulSoup
Crawling pt.3
Today github blog’s topic is about crawling without BeautifulSoup module. Generally, when using crawling technique, I use BeautifulSoup module. Unfortunately, most websites has the informational security problems of the crawling, so the crawling is blocked. To avoid the block of the website, use crawling technique of the dynamic web page.
Use
F12key on your keyboard and findthe referer webpage address and user agenton the network panel.Write python codes of the crawling the following:
1 | url = '[web address to crawl]' |
You can access the dynamic webpage that is blocked by java script(js).