WireShark is an all-inclusive network protocol analyzer. It works to display all the protocol layers including application layer protocols (HTTP and SSL). Though it is well able to capture a multitude of protocols, we focus on the HTTP, which is vital to Web Scraping. Other traffic analyzers are reviewed here.
To catch the HTTP protocol in the Filter input field enter ‘HTTP’ (circled in red) and the sniffer will show up all the HTTP protocol lines with related data.
Timing is poorly organized, so it’s not easy to determine the time latency or time difference. The details (headers, cookie, links, etc.) are seen as you unfold each line captured. No DNS support is provided. Again, it’s almost impossible to see which response matches which request.
Display filters give much flexibility in WireShark. To activate them 0n the toolbar menu go Analyze -> Display filters (then you can press Expression for extra filtering) :
So you can retrieve the data which is of interest using those multifunctional filters.
Sniffer configuring to catch SSL/HTTPS
HTTPS usually travels across port 443, while HTTP traffic is on port 80. You must also provide your security credentials to decrypt SSL into HTTP. For the tutorial on how to decrypt SSL into HTTP, providing you’ve got your private key, watch here.
Basically WireShark is a protocol analyzer that suits well for general HTTP analysis. It’s well-equipped for filtering protocol lines, but it’s not so good at elaborating precise data (like timeline, cache or others).