Ultimate Guide to Proxy Scraping and Checking Using Python: A Comprehensive Tutorial
- Get link
- X
- Other Apps
In this article, we will walk you through creating a robust Python script to scrape, check, and manage proxies. The script utilizes various Python modules and techniques to perform tasks like fetching proxy lists from websites, checking if proxies are working, and organizing them efficiently. We will also introduce how to handle user interactions via a menu-driven interface. Let's dive into the script and explore its components step by step.
Prerequisites
Before proceeding, make sure you have the following Python libraries installed:
bashpip install aiohttp asyncio requests beautifulsoup4 fake-headers tabulate cryptography colorama pillow
Script Overview
The script contains several functions, each designed to handle a specific aspect of the proxy scraping and checking process. It follows an organized structure to:
- Scrape proxies from websites.
- Check the validity of the proxies.
- Organize and backup good proxies.
- User-friendly interface for selecting actions.
Importing Required Libraries
Here's the list of imported modules in the script:
pythonimport re
import aiohttp
import asyncio
import requests
from cryptography.fernet import Fernet
from datetime import datetime, timedelta
from bs4 import BeautifulSoup
from fake_headers import Headers
from colorama import Fore, init
import os
import sys
from tkinter import messagebox, simpledialog
from PIL import Image, ImageTk
from concurrent.futures import ThreadPoolExecutor, as_completed
from time import sleep
These modules help with various functionalities like HTTP requests, parsing HTML, encrypting data, handling concurrency, and creating user interfaces.
Scraping Proxies
The function scrape_proxies()
scrapes proxy lists from websites. It fetches the main page, looks for proxy links, and extracts IP addresses:
pythonasync def scrape_proxies():
url = 'https://www.my-proxy.com/free-proxy-list.html'
async with aiohttp.ClientSession() as session:
page = await fetch_url(session, url)
if page:
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a', href=True)
urls = ['https://www.my-proxy.com/' + link['href'] for link in links if 'free' in link['href']]
urls.append(url)
return urls
# Function to fetch and parse a single URL
async def fetch_url(session, url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
try:
async with session.get(url, headers=headers, ssl=False) as response:
if response.status == 200:
return await response.text()
else:
print(f"Failed to retrieve URL: {url} with status {response.status}")
except Exception as e:
print(f"Error fetching URL {url}: {e}")
return None
Checking Proxy Validity
The script can asynchronously check proxies to determine if they are functional. It connects to each proxy using aiohttp
and tests if the proxy responds correctly.
pythonasync def check_proxy(session, proxy_types, proxy):
global cancel_all, good_proxies_count
for proxy_type in proxy_types:
if cancel_all:
return None
try:
proxy_dict = {
"http": f"{proxy_type}://{proxy}",
"https": f"{proxy_type}://{proxy}",
}
header = Headers(headers=False).generate()
async with session.get('https://ipapi.com/', headers={'User-Agent': header['User-Agent']}, proxy=proxy_dict['http'], timeout=10) as response:
if response.status == 200:
with open('GoodProxy_temp.txt', 'a') as f:
f.write(f'{proxy} | {proxy_type}\n')
good_proxies_count += 1
print(Fore.GREEN + f'{proxy} is working as {proxy_type}' + Fore.RESET)
if stop_threshold and good_proxies_count >= stop_threshold:
cancel_all = True
print(Fore.YELLOW + "Reached desired number of good proxies." + Fore.RESET)
return proxy
except Exception as e:
continue
Saving proxies
The script provides functionalities to save working proxies into files and backup existing proxies:
pythondef save_proxies(proxies):
with open('online_proxy.txt', 'w') as f:
for proxy in proxies:
f.write(f"{proxy}\n")
print(f"Total proxies extracted: {len(proxies)}")
print("Proxies saved to online_proxy.txt")
Running the Script
To run the script, use the following command:
bashpython your_script.py
The program will display a menu where you can select to scrape proxies or check existing ones. The program saves good proxies in a backup file and provides real-time feedback on the proxy status.
Conclusion
This Python script is a comprehensive solution for proxy scraping and checking, equipped with user-friendly features and real-time feedback. With this guide, you can easily modify and extend the script to suit your proxy management needs.
Download resources; click here
- Get link
- X
- Other Apps
Comments
Post a Comment