Advanced Usage of wget2 for Web Operations
Explore advanced wget2 commands for website crawling, data retrieval, and extensive header manipulation. These examples serve as a foundation for security professionals, developers, and researchers aiming to optimize web operations or conduct thorough web application testing.
Parallel Downloading
Download 10 files at same time
wget2 --max-threads 10 https://releases.ubuntu.com/trusty/ubuntu-14.04.6-desktop-amd64.iso iso2 iso3 iso4....etc....
Command Options
Bind to interface
on the local machine.
- Required;
setcap cap_net_raw+ep <path to wget|wget2
wget2 --bind-interface=INTERFACE
Bind to address
on the local machine
wget2 --bind-address=ADDRESS
Select the type of the progress indicator you wish to use
- Supported indicator types are
none
andbar
wget2 --progress=bar
- The parameterized types bar:force and bar:force:noscroll will add the effect of –force-progress.
wget2 --progress=bar:force
Download Files
Download files without creating folders of certain filetype
wget2 --mirror --level 0 --no-directories -A 'png' https://www.adb-shell.com
wget2 --mirror --level 0 --no-directories -A 'png,txt' https://www.adb-shell.com
wget2 --mirror --level 0 --no-directories -A 'mp3,ogg' https://www.adb-shell.com
wget2 --mirror --level 0 --no-directories -A 'img,iso,.bin' https://www.adb-shell.com
Download all files directly without creating folders
wget2 --mirror --level 0 --no-directories https://www.adb-shell.com/bootloader
Download the contents of the android
directory to current directory
wget2 -e robots=off --recursive --mirror --level 1 --no-parent https://www.adb-shell.com/android
Website crawling with validation from file 5.txt
wget2 --max-threads=250 --spider -i 5.txt -save-content-on=200
Retrieve website header information
wget2 https://www.nr1.nu -S
Crawling all bookmarked sites from exported configuration
wget2 --spider --force-html -i bookmarks_5_1_22.html
Script Example
wget2 --header="Accept-Encoding: all" --referer='localhost' \
--mirror --recursive --level 0 --max-threads 15 \
--user-agent='Internet Explorer/1.0 (compatible; Unknown/1.0)' \
https://82.115.149.6/server-status
Maximized wget2
command with various headers spoofed
This command includes spoofing headers related to user-agent, referer, cookies, authentication, caching, encoding, and more. An attacker might use such techniques to manipulate search engine crawlers, deceive users, or bypass security controls. It's essential for defenders to be aware of these tactics and employ appropriate countermeasures to protect against them.
wget2 https://example.com/page \
--method=GET \
--referer='https://www.google.com' \
--user-agent='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' \
--header="Accept-Language: en-US,en;q=0.9" \
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header="Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==" \
--header="Cookie: sessionid=abcdef1234567890" \
--header="Origin: http://www.example.com" \
--header="DNT: 1" \
--header="X-Forwarded-For: 203.0.113.195" \
--header="X-Requested-With: XMLHttpRequest" \
--header="If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT" \
--header="Range: bytes=0-500" \
--header="Content-Type: application/json" \
--header="Host: www.example.com" \
--header="Connection: keep-alive" \
--header="Content-Length: 1024" \
--header="Cache-Control: no-cache" \
--header="Pragma: no-cache" \
--header="Accept-Encoding: gzip, deflate" \
--header="Upgrade-Insecure-Requests: 1" \
--header="TE: trailers" \
--header="If-None-Match: \"etag123\"" \
--header="Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==" \
--header="Accept-Charset: utf-8" \
--header="X-Forwarded-Proto: https" \
--header="X-Forwarded-Host: example.com" \
--header="X-Forwarded-Port: 443" \
--header="X-Forwarded-Server: proxy.example.com" \
--header="X-Real-IP: 203.0.113.195" \
--header="X-Content-Type-Options: nosniff"
Web page retrieval with authentication and specific user agent
wget2 https://www.nr1.nu \
--method=GET \
--http-user='some1' \
--http-password='hidden@mail.gov' \
--referer='https://www.google.com' \
--user-agent='(SomeRandomUserAgent) Apple/v1.0)' \
--save-headers \
--auth-no-challenge \
--header="Accept-Encoding: all" \
--secure-protocol=auto \
--http2=on \
--https-enforce=soft \
-A '*.html' -r
Crawler script for web scraping
wget2 --method=GET --password=yourFriend --user=yourFriend \
--http-user=yourFriend --http-password=yourFriend \
--referer='https://random.gov/secr3t/crawler' \
--user-agent='(random Crawler/v1.0.1) Hunter)' \
--adjust-extension -o ~/logs/wget2/wget2.log \
--stats-site=h:~/logs/wget2/stats-site.log \
--stats-server=h:~/logs/wget2/-stats-server.log \
--stats-tls=h:~/logs/wget2/stats-tls.log \
--stats-ocsp=h:~/logs/wget2/stats-oscp.log \
--stats-dns=h:~/logs/wget2/stats-dns.log \
--progress=bar --backups=backups --force-progress \
--server-response --quote=0 -e robots=off \
--inet4-only --tcp-fastopen --chunk-size=10M \
--local-encoding=encoding --remote-encoding=encoding \
--verify-save-failed --header='Accept-Charset: iso-8859-2' \
--max-redirect=250 --dns-caching --http2-request-window=250 \
--cut-dirs=100 --unlink --spider --limit-rate=20k --random-wait \
Creating local copies of websites for offline browsing or backup
mkdir -p ~/logs/wget2/ # Ensure the log directory exists
wget2 \
--mirror --recursive --level=inf --page-requisites \
--convert-links --adjust-extension --max-threads=15 \
--no-parent --timestamping \
--directory-prefix=./mirror_site \
--user-agent='/dev/null' \
--output-file=~/logs/wget2/mirror.log \
-e robots=off \
--header="Accept-Encoding: gzip, deflate" \
--referer='/dev/null' \
--header="Host: www.realhost.com"
https://82.115.149.6/server-status
Header Spoofing
User-Agent
This header identifies the client making the request. Spoofing it can help in scenarios where servers provide different responses based on the user-agent.
--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"
Accept-Language
This header specifies the preferred natural language(s) for the response. Spoofing it might be useful for testing multilingual websites.
--header="Accept-Language: en-US,en;q=0.9"
Accept
Indicates the media types that are acceptable for the response. Spoofing it can be helpful when testing how servers handle different types of content.
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
Authorization
Used to authenticate a user agent with a server, typically when accessing protected resources. Spoofing it might help in testing access control mechanisms.
--header="Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ=="
Cookie
This header contains stored HTTP cookies previously sent by the server with the Set-Cookie header. Spoofing cookies can simulate different user sessions.
bash -header="Cookie: name=value; name2=value2"
Origin
Specifies the URI of the page from which the request originated. Spoofing it might be useful in CORS (Cross-Origin Resource Sharing) testing.
--header="Origin: http://example.com"
DNT (Do Not Track)
Indicates the user's tracking preference. Spoofing it can be helpful for privacy-related testing.
--header="DNT: 1"
X-Forwarded-For
Typically used in proxy configurations, this header identifies the original IP address of the client. Spoofing it can manipulate server logs or bypass IP-based restrictions.
--header="X-Forwarded-For: 192.0.2.1"
X-Requested-With
Often used in AJAX requests, this header identifies the type of request originating from a browser. Spoofing it might be helpful in testing AJAX-based functionalities.
--header="X-Requested-With: XMLHttpRequest"
Referer
Indicates the URL of the page from which the request originated. Spoofing it, as you've already done in your wget2 command, can simulate different referral sources.
--referer="https://example.com"
If-Modified-Since
This header allows conditional GET requests. Spoofing it can help in testing cache mechanisms and conditional retrieval.
--header="If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT"
Range
Used for requesting only a portion of a resource. Spoofing it might help in testing partial content retrieval.
--header="Range: bytes=0-500"
Content-Type
Specifies the media type of the resource. Spoofing it can be useful for testing how servers handle different content types.
--header="Content-Type: application/json"
Host
Specifies the domain name of the server (as you've already used in your wget2 command). Spoofing it can be helpful when accessing resources via IP address or when testing virtual hosting configurations.
--header="Host: www.example.com"
Connection
Specifies whether the connection should be kept alive or closed after the response is received. Spoofing it might be useful for testing server connection handling.
--header="Connection: keep-alive"
Content-Length
Indicates the size of the message body in bytes. Spoofing it can be useful for testing how servers handle varying content lengths.
``bash –header="Content-Length: 1024" ```
Cache-Control
Specifies directives for caching mechanisms in both requests and responses. Spoofing it can help in testing cache-related behaviors.
--header="Cache-Control: no-cache"
Pragma
Used in HTTP/1.0 for backwards compatibility with HTTP/1.1 Cache-Control. Spoofing it can affect cache directives.
--header="Pragma: no-cache"
Accept-Encoding
Indicates the content encodings that the client can understand. Spoofing it can be helpful for testing content compression mechanisms
--header="Accept-Encoding: gzip, deflate"
Upgrade-Insecure-Requests
Indicates that the client prefers a secure connection. Spoofing it can simulate secure connection preferences.
--header="Upgrade-Insecure-Requests: 1"
TE (Transfer-Encoding)
Specifies the transfer-coding that has been applied to the message body. Spoofing it can affect how the server processes the request body.
--header="TE: trailers"
If-None-Match
Allows conditional requests based on entity tags. Spoofing it can help in testing cache validation.
--header="If-None-Match: \"etag123\""
Proxy-Authorization
Used to authenticate with a proxy server. Spoofing it might be helpful when testing proxy authentication mechanisms.
--header="Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ=="
Accept-Charset
Indicates the character sets that are acceptable for the response. Spoofing it can be useful for testing character encoding-related behaviors.
--header="Accept-Charset: utf-8"
X-Forwarded-Proto
Specifies the protocol used in the original request. Spoofing it might be helpful in testing reverse proxy configurations.
--header="X-Forwarded-Proto: https"
X-Forwarded-Host
Specifies the original host requested by the client. Spoofing it can be useful in testing reverse proxy configurations.
--header="X-Forwarded-Host: example.com"
X-Forwarded-Port
Specifies the original port requested by the client. Spoofing it can be useful in testing reverse proxy configurations.
--header="X-Forwarded-Port: 443"
X-Forwarded-Server
Specifies the server that received the original request. Spoofing it can be useful in testing reverse proxy configurations.
--header="X-Forwarded-Server: proxy.example.com"
X-Real-IP
Used to identify the real IP address of the client connecting to a web server through a proxy. Spoofing it can affect server logging or IP-based access controls.
--header="X-Real-IP: 203.0.113.195"
X-Content-Type-Options
Prevents content type sniffing (MIME sniffing) in browsers. Spoofing it can affect how browsers interpret the content type
--header="X-Content-Type-Options: nosniff"