2

So, I've been webscraping a page for about a year now, and since last week I started to get HTTP code 403 everytime I try connecting to it using python-requests.

From my web browser and curl there is no problem at all. So, I assumed the problem was some kind of verification before the HTTP packet was sent, and analysing the packets through WireShark I could see some minor diferences on the TLS Handshake that is made before the HTTP packet is sent.

What I would like to do is to provide some basic changes to the Handshake used on the connection. For example, one of the things that are clearly different from python to my curl/Web Browser is that python/OPENSSL seems to always adds padding to the Handshake, while the other 2 don't (curl/Web Browser).

Is there any way to change, for example, the use of padding on the TLS Handshake through some option in Requests? If not, how could one do this by modifying some kind of call to SSL source-code or any other library?

This is the python-requests code that gives the 403 error:

import requests

s = requests.session()

s.headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Language': 'pt-BR,pt;q=0.9',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'DNT': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',
    'sec-ch-ua': '"Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

response = s.get(
    'https://www63.bb.com.br/portalbb/djo/id/resgate/dadosResgate.bbx'
)

And here is the curl one that doesn't give 403 and get me to the page I wanted:

curl "https://www63.bb.com.br/portalbb/djo/id/resgate/dadosResgate.bbx" ^
  -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" ^
  -H "Accept-Language: pt-BR,pt;q=0.9" ^
  -H "Cache-Control: max-age=0" ^
  -H "Connection: keep-alive" ^
  -H "DNT: 1" ^
  -H "Sec-Fetch-Dest: document" ^
  -H "Sec-Fetch-Mode: navigate" ^
  -H "Sec-Fetch-Site: none" ^
  -H "Sec-Fetch-User: ?1" ^
  -H "Upgrade-Insecure-Requests: 1" ^
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36" ^
  -H "sec-ch-ua: \"Chromium\";v=\"130\", \"Google Chrome\";v=\"130\", \"Not?A_Brand\";v=\"99\"" ^
  -H "sec-ch-ua-mobile: ?0" ^
  -H "sec-ch-ua-platform: \"Windows\""

I've already tried going into requests and SSL modules source-code, but couldn't find anything useful about TLS Handshake configuration.

2
  • 1
    It's not likely the handshake. That domain is 'protected' by (i.e. located behind) cloudflare, which blocks requests it considers abusive likely/often including scraping Commented Nov 25, 2024 at 20:24
  • @dave_thompson_085 Even though the chances are slim, I would like to try and change some of these parameters on the TLS Handshake. Do you know if there is anyway to do that? The most strange thing is that both requests come from the same IP and have the same Headers and Body. I can't think of any other thing that could be checked so that Cloudfare blocks me from requesting through python-requests. If you also have any other insights I would be glad to hear. Commented Nov 26, 2024 at 12:27

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.