wspider (v2.0)
Probably one of the fastest crawlers you've ever seen
This is a newer version of the former wmirror which is now archived. I will spend a lot more time on wspider to make it more powerful. Wmirror was using a single thread with old wget during download, while wspider allows you to download with multiple threads. wspider is extremely small and fast. I will do my best to add support for authentications as soon as possible, so you can crawl sites you've logged in to.
wspider is EXTREMELY powerful and fast. It uses wget2, which in many cases downloads much faster than wget1.x due to HTTP2, HTTP compression, parallel connections, and the use of If-Modified-Since HTTP header.
Be careful and use wspider at your own risk, as it can be extremely fast!
Happy crawling! ;)
wspider (250 threads)
wspider (10 threads)
wspider (1 thread)
GNU/Parallel (old and slow - using a lot of resources for nothing)
Crawler is from here: https://www.gnu.org/software/parallel/man.html
Get Started On Linux/MacOSX
git clone https://github.com/wuseman/wspider
cd wspider
chmod +x wspider.sh
./wspider.sh -u <url> -d <path> -t <threads>
System Requirements
- wget2 - Find more info about wget2 here
IMPORTANT
wuseman cannot be held responsible for users' actions regardless of what damage a user can achieve with the information/data wspider might collect. All users who gather information or data via wspider are 100% responsible for their own actions. wspider has been developed for legal purposes.