- Stream Notes
- Coding
- Did some Xpath digging for MeaLeon/scraping
- AllRecipes data cleaning:
- Need to give a different header/user-agent
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0'}
- [ x ] Fix the URLs
logseq.order-list-type:: number
- Url checking: if response is not 200 status, drop recipe/add flag to drop recipe
logseq.order-list-type:: number
- If 200,
- a. Get response.xpath(‘//*[@id=“mntl-universal-breadcrumbs_1-0”]/li/a/span/text()’).getall()
- b. Check if “Global Cuisines” in that array/list, if so, keep the whole array
- Otherwise, cuisine tag is [“Missing Cuisine”]
- However, American cuisine may not count at AllRecipes
- We raided TypeErrorDev
Socials
Related