Explain the principles behind various proxy IPs: More and more people know about proxy IPs. Whether it is simply changing IPs, or collecting a lot of data, or wandering in the gray, the proxy IP tool is indispensable.
1. Proxy type
Proxy IPs can be divided into 4 types. The transparent proxy IPs, anonymous proxy IPs, and highly anonymous proxy IPs that are often heard of, and the other is the obfuscated proxy IPs. In terms of the most basic security level, their arrangement order should be like this: highly anonymous > obfuscated > anonymous > transparent.
2. Proxy principle
The proxy type mainly depends on the configuration of the proxy server. Different configurations will form different proxy types. In the configuration, these three variables REMOTE_ADDR, HTTP_VIA, and HTTP_X_FORWARDED_FOR are the decisive factors.
1)REMOTE_ADDR
REMOTE_ADDR represents the client's IP, but its value is not provided by the client, but is specified by the server based on the client's IP.
If you use a browser to directly access a website, the website's web server (Nginx, Apache, etc.) will set REMOTE_ADDR to the client's IP address.
If we set a proxy for the browser, our request to access the target website will first pass through the proxy server, and then the proxy server will convert the request to the target website. Then the web proxy server of the website will set REMOTE_ADDR to the IP of the proxy server.
2) X-Forwarded-For (XFF)
X-Forwarded-For is an HTTP extension header used to indicate the real IP of the HTTP request end. When the client uses a proxy, the web proxy server does not know the real IP address of the client. To avoid this situation, the proxy server usually adds an X-Forwarded-For header information and adds the client's IP to the header information.
The format of the X-Forwarded-For request header is as follows:
X-Forwarded-For:client,proxy1,proxy2
client indicates the IP address of the client; proxy1 is the IP of the device farthest from the server; proxy2 is the IP of the secondary proxy device; from the format, it can be seen that there can be multiple layers of proxies from the client to the server.
If an HTTP request passes through three proxies Proxy1, Proxy2, and Proxy3 before reaching the server, with IPs IP1, IP2, and IP3 respectively, and the user's real IP is IP0, then according to the XFF standard, the server will eventually receive the following information:
X-Forwarded-For:IP0,IP1,IP2
Proxy3 is directly connected to the server, and it will add IP2 to XFF, indicating that it is helping Proxy2 forward the request. There is no IP3 in the list, and IP3 can be obtained on the server through the RemoteAddress field. We know that HTTP connection is based on TCP connection, and there is no concept of IP in the HTTP protocol. RemoteAddress comes from TCP connection, indicating the IP of the device that establishes a TCP connection with the server, which is IP3 in this example.
3) HTTP_VIA
via is a header in the HTTP protocol, which records the proxies and gateways that an HTTP request passes through. If it passes through one proxy server, one proxy server information is added, and if it passes through two, two are added.
3. Proxy selection
Ordinary anonymous proxy IP can hide the real IP of the client, but it will also change our request information, and the server may think that we are using a proxy. However, when using this type of proxy, although the visited website cannot know the client's IP address, it can still know that you are using a proxy. Of course, some web pages that can detect IP can still find the client's IP.
Highly anonymous proxy, please add a link description. It does not change the client's request. In this way, it looks like a real client browser is accessing it from the server's perspective. At this time, the client's real IP is hidden, and the server will not think that we are using a proxy.
Therefore, when the crawler program needs to use the crawler proxy IP, try to choose ordinary anonymous proxy and high anonymous proxy. In addition, if you want to ensure that the data is not known by the proxy server, it is recommended to use a proxy with the HTTPS protocol.
Use the following proxy to safely and efficiently proxy IP