Skip to content

Advanced Usage of wget2 for Web Operations

Explore advanced wget2 commands for website crawling, data retrieval, and extensive header manipulation. These examples serve as a foundation for security professionals, developers, and researchers aiming to optimize web operations or conduct thorough web application testing.


Parallel Downloading

Download 10 files at same time

wget2 --max-threads 10 https://releases.ubuntu.com/trusty/ubuntu-14.04.6-desktop-amd64.iso iso2 iso3 iso4....etc....

Command Options

Bind to interface on the local machine.

  • Required; setcap cap_net_raw+ep <path to wget|wget2
wget2 --bind-interface=INTERFACE

Bind to address on the local machine

wget2 --bind-address=ADDRESS

Select the type of the progress indicator you wish to use

  • Supported indicator types are none and bar
wget2 --progress=bar 
  • The parameterized types bar:force and bar:force:noscroll will add the effect of –force-progress.
wget2 --progress=bar:force

Download Files

Download files without creating folders of certain filetype

wget2 --mirror --level 0 --no-directories -A 'png' https://www.adb-shell.com
wget2 --mirror --level 0 --no-directories -A 'png,txt' https://www.adb-shell.com      
wget2 --mirror --level 0 --no-directories -A 'mp3,ogg' https://www.adb-shell.com      
wget2 --mirror --level 0 --no-directories -A 'img,iso,.bin' https://www.adb-shell.com      

Download all files directly without creating folders

 wget2 --mirror --level 0 --no-directories https://www.adb-shell.com/bootloader

Download the contents of the android directory to current directory

wget2 -e robots=off --recursive --mirror --level 1 --no-parent https://www.adb-shell.com/android

Website crawling with validation from file 5.txt

wget2 --max-threads=250 --spider -i 5.txt -save-content-on=200

Retrieve website header information

wget2 https://www.nr1.nu -S

Crawling all bookmarked sites from exported configuration

wget2 --spider --force-html -i bookmarks_5_1_22.html 

Script Example

wget2 --header="Accept-Encoding: all" --referer='localhost' \ 
--mirror --recursive --level 0 --max-threads 15 \
--user-agent='Internet Explorer/1.0 (compatible; Unknown/1.0)' \
https://82.115.149.6/server-status

Maximized wget2 command with various headers spoofed

This command includes spoofing headers related to user-agent, referer, cookies, authentication, caching, encoding, and more. An attacker might use such techniques to manipulate search engine crawlers, deceive users, or bypass security controls. It's essential for defenders to be aware of these tactics and employ appropriate countermeasures to protect against them.

wget2 https://example.com/page \
--method=GET \
--referer='https://www.google.com' \
--user-agent='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' \
--header="Accept-Language: en-US,en;q=0.9" \
--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" \
--header="Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==" \
--header="Cookie: sessionid=abcdef1234567890" \
--header="Origin: http://www.example.com" \
--header="DNT: 1" \
--header="X-Forwarded-For: 203.0.113.195" \
--header="X-Requested-With: XMLHttpRequest" \
--header="If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT" \
--header="Range: bytes=0-500" \
--header="Content-Type: application/json" \
--header="Host: www.example.com" \
--header="Connection: keep-alive" \
--header="Content-Length: 1024" \
--header="Cache-Control: no-cache" \
--header="Pragma: no-cache" \
--header="Accept-Encoding: gzip, deflate" \
--header="Upgrade-Insecure-Requests: 1" \
--header="TE: trailers" \
--header="If-None-Match: \"etag123\"" \
--header="Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==" \
--header="Accept-Charset: utf-8" \
--header="X-Forwarded-Proto: https" \
--header="X-Forwarded-Host: example.com" \
--header="X-Forwarded-Port: 443" \
--header="X-Forwarded-Server: proxy.example.com" \
--header="X-Real-IP: 203.0.113.195" \
--header="X-Content-Type-Options: nosniff"

Web page retrieval with authentication and specific user agent

wget2 https://www.nr1.nu \
--method=GET  \
--http-user='some1' \
--http-password='hidden@mail.gov' \
--referer='https://www.google.com' \
--user-agent='(SomeRandomUserAgent) Apple/v1.0)'  \
--save-headers \
--auth-no-challenge \
--header="Accept-Encoding: all" \
--secure-protocol=auto \
--http2=on \
--https-enforce=soft \
-A '*.html' -r  

Crawler script for web scraping

wget2 --method=GET --password=yourFriend --user=yourFriend \
--http-user=yourFriend --http-password=yourFriend \
--referer='https://random.gov/secr3t/crawler' \
--user-agent='(random Crawler/v1.0.1) Hunter)' \
--adjust-extension -o ~/logs/wget2/wget2.log \
--stats-site=h:~/logs/wget2/stats-site.log \
--stats-server=h:~/logs/wget2/-stats-server.log \
--stats-tls=h:~/logs/wget2/stats-tls.log \
--stats-ocsp=h:~/logs/wget2/stats-oscp.log \
--stats-dns=h:~/logs/wget2/stats-dns.log \
--progress=bar --backups=backups --force-progress \
--server-response --quote=0  -e robots=off \
--inet4-only --tcp-fastopen --chunk-size=10M \
--local-encoding=encoding --remote-encoding=encoding \
--verify-save-failed --header='Accept-Charset: iso-8859-2' \
--max-redirect=250 --dns-caching --http2-request-window=250 \
--cut-dirs=100 --unlink --spider --limit-rate=20k --random-wait \

Creating local copies of websites for offline browsing or backup

mkdir -p ~/logs/wget2/  # Ensure the log directory exists
wget2   \
  --mirror --recursive --level=inf --page-requisites \
  --convert-links --adjust-extension --max-threads=15 \
  --no-parent --timestamping \
  --directory-prefix=./mirror_site \
  --user-agent='/dev/null' \
  --output-file=~/logs/wget2/mirror.log \
  -e robots=off \
  --header="Accept-Encoding: gzip, deflate" \
  --referer='/dev/null' \
  --header="Host: www.realhost.com"
  https://82.115.149.6/server-status

Header Spoofing

User-Agent

This header identifies the client making the request. Spoofing it can help in scenarios where servers provide different responses based on the user-agent.

--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"

Accept-Language

This header specifies the preferred natural language(s) for the response. Spoofing it might be useful for testing multilingual websites.

--header="Accept-Language: en-US,en;q=0.9"

Accept

Indicates the media types that are acceptable for the response. Spoofing it can be helpful when testing how servers handle different types of content.

--header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"

Authorization

Used to authenticate a user agent with a server, typically when accessing protected resources. Spoofing it might help in testing access control mechanisms.

--header="Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ=="

Cookie

This header contains stored HTTP cookies previously sent by the server with the Set-Cookie header. Spoofing cookies can simulate different user sessions.

bash -header="Cookie: name=value; name2=value2"

Origin

Specifies the URI of the page from which the request originated. Spoofing it might be useful in CORS (Cross-Origin Resource Sharing) testing.

--header="Origin: http://example.com"

DNT (Do Not Track)

Indicates the user's tracking preference. Spoofing it can be helpful for privacy-related testing.

--header="DNT: 1"

X-Forwarded-For

Typically used in proxy configurations, this header identifies the original IP address of the client. Spoofing it can manipulate server logs or bypass IP-based restrictions.

--header="X-Forwarded-For: 192.0.2.1"

X-Requested-With

Often used in AJAX requests, this header identifies the type of request originating from a browser. Spoofing it might be helpful in testing AJAX-based functionalities.

--header="X-Requested-With: XMLHttpRequest"

Referer

Indicates the URL of the page from which the request originated. Spoofing it, as you've already done in your wget2 command, can simulate different referral sources.

--referer="https://example.com"

If-Modified-Since

This header allows conditional GET requests. Spoofing it can help in testing cache mechanisms and conditional retrieval.

--header="If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT"

Range

Used for requesting only a portion of a resource. Spoofing it might help in testing partial content retrieval.

--header="Range: bytes=0-500"

Content-Type

Specifies the media type of the resource. Spoofing it can be useful for testing how servers handle different content types.

--header="Content-Type: application/json"

Host

Specifies the domain name of the server (as you've already used in your wget2 command). Spoofing it can be helpful when accessing resources via IP address or when testing virtual hosting configurations.

--header="Host: www.example.com"

Connection

Specifies whether the connection should be kept alive or closed after the response is received. Spoofing it might be useful for testing server connection handling.

--header="Connection: keep-alive"

Content-Length

Indicates the size of the message body in bytes. Spoofing it can be useful for testing how servers handle varying content lengths.

``bash –header="Content-Length: 1024" ```

Cache-Control

Specifies directives for caching mechanisms in both requests and responses. Spoofing it can help in testing cache-related behaviors.

--header="Cache-Control: no-cache"

Pragma

Used in HTTP/1.0 for backwards compatibility with HTTP/1.1 Cache-Control. Spoofing it can affect cache directives.

--header="Pragma: no-cache"

Accept-Encoding

Indicates the content encodings that the client can understand. Spoofing it can be helpful for testing content compression mechanisms

--header="Accept-Encoding: gzip, deflate"

Upgrade-Insecure-Requests

Indicates that the client prefers a secure connection. Spoofing it can simulate secure connection preferences.

--header="Upgrade-Insecure-Requests: 1"

TE (Transfer-Encoding)

Specifies the transfer-coding that has been applied to the message body. Spoofing it can affect how the server processes the request body.

--header="TE: trailers"

If-None-Match

Allows conditional requests based on entity tags. Spoofing it can help in testing cache validation.

--header="If-None-Match: \"etag123\""

Proxy-Authorization

Used to authenticate with a proxy server. Spoofing it might be helpful when testing proxy authentication mechanisms.

--header="Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ=="

Accept-Charset

Indicates the character sets that are acceptable for the response. Spoofing it can be useful for testing character encoding-related behaviors.

--header="Accept-Charset: utf-8"

X-Forwarded-Proto

Specifies the protocol used in the original request. Spoofing it might be helpful in testing reverse proxy configurations.

--header="X-Forwarded-Proto: https"

X-Forwarded-Host

Specifies the original host requested by the client. Spoofing it can be useful in testing reverse proxy configurations.

--header="X-Forwarded-Host: example.com"

X-Forwarded-Port

Specifies the original port requested by the client. Spoofing it can be useful in testing reverse proxy configurations.

--header="X-Forwarded-Port: 443"

X-Forwarded-Server

Specifies the server that received the original request. Spoofing it can be useful in testing reverse proxy configurations.

--header="X-Forwarded-Server: proxy.example.com"

X-Real-IP

Used to identify the real IP address of the client connecting to a web server through a proxy. Spoofing it can affect server logging or IP-based access controls.

--header="X-Real-IP: 203.0.113.195"

X-Content-Type-Options

Prevents content type sniffing (MIME sniffing) in browsers. Spoofing it can affect how browsers interpret the content type

--header="X-Content-Type-Options: nosniff"