lessons learned and skills learned

  • basically scrapy usage
  • python 3 print function should always have parenthesis
  • xpath basic knowledge learned
  • python yield, this key word usually in some generator function, and if that generator is being executed, it will stop and return the value when meet yield expression. Next time when the generator is called, begin from where you left last time.
  • python yield is very useful when some item you only read once. It can avoid genarating an array cost a lot of memory. Espically when there are huge number of items.
  • In the scrapy project, the project name and the spider name should not be the same. Or it will failed when you import some module in some file