Scraping Free Premium HTTP, SOCKS4, and SOCKS5 Proxies Using Python’s Asyncio, Aiohttp, and Task Gathering

If you're into web scraping, automation, or network management, you’ll often need a list of reliable proxies to rotate during requests. Proxies can help you stay anonymous, bypass rate limits, or access geo-restricted content. In this guide, I’ll walk you through how to scrape free premium HTTP, SOCKS4, and SOCKS5 proxies using Python, leveraging powerful libraries like asyncio, aiohttp, and concurrent.futures for threading. We will also ensure the proxies are fast and reliable by checking their response times.

Why Python for Proxy Scraping?

Python is an excellent tool for web scraping due to its extensive ecosystem of libraries. For this project, I’m using:

Asyncio: Handles asynchronous programming, allowing us to run multiple tasks concurrently.
Aiohttp: A powerful asynchronous HTTP client that works with asyncio to make non-blocking HTTP requests.
Threading: Helps us efficiently manage tasks when we need to process multiple proxy batches in parallel.

Setting Up the Environment

To follow along with this tutorial, you'll need Python 3 installed on your system. Additionally, you'll want to install the required libraries by running the following:

bash
pip install aiohttp asyncio fake-useragent

Step-by-Step Python Script to Scrape Proxies

Below is a sample Python script that fetches free HTTP, SOCKS4, and SOCKS5 proxies and checks their response time to ensure they’re fast and working. This script also divides proxies into packs and uses multithreading to check each pack concurrently.

python
import aiohttp
import asyncio
import math
from fake_useragent import UserAgent
from concurrent.futures import ThreadPoolExecutor
import time

# Initialize global variables
good_proxies_count = 0
cancel_all = False
stop_threshold = 10  # Stop after finding this number of good proxies

# Function to check a single proxy
async def check_proxy(session, protocol_choice, proxy, position):
    """Check if a proxy is working and update the results asynchronously."""
    global good_proxies_count, cancel_all
    
    for proxy_type in protocol_choice:
        if cancel_all:
            return  # Stop if the threshold is reached
        
        try:
            proxy_url = f"{proxy_type}://{proxy}"
            headers = {'User-Agent': UserAgent().random}
            start_time = time.time()
            
            async with session.get('https://httpbin.org/ip', headers=headers, proxy=proxy_url, timeout=10) as response:
                if response.status == 200:
                    elapsed_time = time.time() - start_time
                    good_proxies_count += 1
                    print(f"Proxy {proxy} ({proxy_type}) is working! Response time: {elapsed_time:.2f} seconds.")
                    # Write good proxy to file or database
                    
                    if good_proxies_count >= stop_threshold:
                        cancel_all = True
                        print(f"Reached the threshold of {stop_threshold} good proxies.")
                        return
        except (aiohttp.ClientError, asyncio.TimeoutError):
            pass  # Ignore bad proxies

# Main function to check proxies
async def check_working_proxies(proxy_list, protocol_choice):
    """Check proxies in batches asynchronously for their functionality."""
    global cancel_all, good_proxies_count
    
    total_proxies = len(proxy_list)
    print(f'Total unique proxies: {total_proxies}')
    
    # Divide proxies into packs
    pack_size = total_proxies if total_proxies < 100 else math.ceil(total_proxies / 10)
    proxy_packs = [proxy_list[i:i + pack_size] for i in range(0, total_proxies, pack_size)]
    
    # Process each pack of proxies concurrently
    for pack_num, proxy_pack in enumerate(proxy_packs, start=1):
        print(f'Checking proxies in pack {pack_num}/{len(proxy_packs)}')

        async with aiohttp.ClientSession() as session:
            tasks = [check_proxy(session, protocol_choice, proxy, i) for i, proxy in enumerate(proxy_pack)]
            await asyncio.gather(*tasks, return_exceptions=True)

        if cancel_all:
            break  # Stop checking if the threshold is reached
    
    if good_proxies_count > 0:
        print(f"Finished checking proxies. {good_proxies_count} good proxies found.")
    else:
        print("No good proxies found.")

# Sample proxy list (you can scrape these from free proxy sites)
proxy_list = ["123.45.67.89:8080", "98.76.54.32:3128"]  # Example proxies
protocol_choice = ['http', 'socks4', 'socks5']

# Running the script
loop = asyncio.get_event_loop()
loop.run_until_complete(check_working_proxies(proxy_list, protocol_choice))

Key Features of the Script

Asynchronous HTTP Requests: By using aiohttp with asyncio, the script can send multiple requests at once without waiting for one to complete before starting another.
Proxy Response Time: The script measures the time it takes for each proxy to respond, ensuring you only use fast, reliable proxies in your projects.
Threshold Setting: You can set a threshold (e.g., 10 working proxies), and the script will stop checking once the number of working proxies meets this threshold.
Packs and Multithreading: Proxies are divided into packs for better management, and using ThreadPoolExecutor ensures that multiple packs can be processed concurrently.

Why You Should Use This Script

Performance: This script uses asynchronous programming to maximize performance and minimize wait times when checking proxies.
Scalability: You can easily adjust the script to handle hundreds or even thousands of proxies by tweaking the pack size and thread pool.
Reliability: By checking the response time of each proxy, you can ensure only the fastest proxies are used in your online projects.

Conclusion

Scraping proxies and verifying their performance is essential for any online project involving web scraping, automation, or API management. By using Python’s asyncio and aiohttp, this task becomes much more efficient. This script ensures that only fast and reliable proxies make the cut, helping you avoid slow or dead proxies.

Feel free to tweak the script to fit your needs, and don’t forget to watch the full YouTube tutorial for a detailed walkthrough!

download resources; click here

Search This Blog

window best software

Discover the Best Website for Free HTTP, SSL, and SOCKS Proxies + Python Script Guide