Making Crawler Fast and Reliable
So, as I told in the previous article - the basic version of the Crawler worked well and proved to be usable. The problem - it was slow and unstable.
To make it fast we need to run it on multiple machines (about 5 - 20). And to make it stable we need to figure out how to make reliable system from unreliable components.
Multiple machines instead of just one make things a bit complex because couple issues arise:
- Provisioning and deployment couple of tens of machines. I don't want to do it by hands.
- Handle crashes of machines and heal after it. Crawler should be robust and continue to work if one of its nodes crashes, and pick it up again when this node get fixed or new node get added.
- Detect if one of its nodes hanged and need to be rebooted.