Remove proxies duplicates from remote?

hello dear ob members.
im using remote proxies from multi various api’s in my job like this in printscreen


but after it all scrapped and rdy to work in job checking i got like number 66k
the actual number of proxies without any duplicates should be like 14k when i tried to scrape them outside and did that i found it the rest are duplicates because every api might using some same proxies and few are different .
my idea is to keep them updating every period while removing duplicates through ob job tab auto while working on my job checking not manual by me scrapping them outside and put them in group and not remotely .

What I do is run a local webserver on the same PC as ob2, and I use a php script to scrape the proxies and write it to 3 files (http,sock4/5). That way you only need to add 3 sources to the job (eg. http://127.0.0.1/http.txt) but you have full control of what gets written to those files from any many sources as you can, I am going to create my own proxy check aswell as remove dupes before it writes to the proxy list.

2 Likes

got you , its a bit complicated but nice

With the help of CHADGTP

I created this script

<?php
// Define the URLs to scrape proxies from
$urls = [
    'http' => [
        'https://raw.githubusercontent.com/monosans/proxy-list/main/proxies/http.txt',
        'https://raw.githubusercontent.com/ErcinDedeoglu/proxies/main/proxies/http.txt',
        'https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt',
        'https://raw.githubusercontent.com/mmpx12/proxy-list/master/http.txt',
        'https://raw.githubusercontent.com/ErcinDedeoglu/proxies/main/proxies/https.txt',
        'https://raw.githubusercontent.com/mmpx12/proxy-list/master/https.txt',
        'https://raw.githubusercontent.com/MuRongPIG/Proxy-Master/main/http.txt',
        'https://raw.githubusercontent.com/zevtyardt/proxy-list/main/http.txt',
        'https://raw.githubusercontent.com/ALIILAPRO/Proxy/main/http.txt',
        'https://sunny9577.github.io/proxy-scraper/generated/http_proxies.txt'
    ],
    'socks4' => [
        'https://raw.githubusercontent.com/monosans/proxy-list/main/proxies/socks4.txt',
        'https://raw.githubusercontent.com/ErcinDedeoglu/proxies/main/proxies/socks4.txt',
        'https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/socks4.txt',
        'https://raw.githubusercontent.com/mmpx12/proxy-list/master/socks4.txt',
        'https://raw.githubusercontent.com/MuRongPIG/Proxy-Master/main/socks4.txt',
        'https://raw.githubusercontent.com/zevtyardt/proxy-list/main/socks4.txt',
        'https://raw.githubusercontent.com/ALIILAPRO/Proxy/main/socks4.txt',
        'https://sunny9577.github.io/proxy-scraper/generated/socks4_proxies.txt'
    ],
    'socks5' => [
        'https://raw.githubusercontent.com/monosans/proxy-list/main/proxies/socks5.txt',
        'https://raw.githubusercontent.com/ErcinDedeoglu/proxies/main/proxies/socks5.txt',
        'https://raw.githubusercontent.com/mmpx12/proxy-list/master/socks5.txt',
        'https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/socks5.txt',
        'https://raw.githubusercontent.com/hookzof/socks5_list/master/proxy.txt',
        'https://raw.githubusercontent.com/MuRongPIG/Proxy-Master/main/socks5.txt',
        'https://raw.githubusercontent.com/zevtyardt/proxy-list/main/socks5.txt',
        'https://raw.githubusercontent.com/ALIILAPRO/Proxy/main/socks5.txt',
        'https://sunny9577.github.io/proxy-scraper/generated/socks5_proxies.txt'
    ]
];

// Function to fetch proxies from a URL
function fetch_proxies($url) {
    $proxies = file_get_contents($url);
    return explode("\n", $proxies);
}

// Function to write proxies to a file
function write_proxies($filename, $proxies) {
    file_put_contents($filename, implode("\n", $proxies));
}

// Function to remove duplicates
function remove_duplicates($proxies) {
    return array_unique(array_filter($proxies));
}

// Arrays to hold proxies
$http_proxies = [];
$socks4_proxies = [];
$socks5_proxies = [];

// Fetch and process proxies
foreach ($urls as $type => $url_list) {
    $all_proxies = [];
    foreach ($url_list as $url) {
        $proxies = fetch_proxies($url);
        $all_proxies = array_merge($all_proxies, $proxies);
    }
    $all_proxies = remove_duplicates($all_proxies);

    if ($type == 'http') {
        $http_proxies = $all_proxies;
    } elseif ($type == 'socks4') {
        $socks4_proxies = $all_proxies;
    } elseif ($type == 'socks5') {
        $socks5_proxies = $all_proxies;
    }
}

// Write proxies to files
write_proxies('http.txt', $http_proxies);
write_proxies('socks4.txt', $socks4_proxies);
write_proxies('socks5.txt', $socks5_proxies);

echo "Proxies have been fetched and written to files successfully.";
?>

Step-by-Step Guide for Server Setup and Script Execution

  1. Download and Install XAMPP:

    • Go to the XAMPP website.
    • Download the version suitable for your operating system (Windows, Linux, macOS).
    • Follow the installation instructions.
  2. Start XAMPP:

    • Launch XAMPP Control Panel.
    • Start the Apache module by clicking “Start” next to Apache.
  3. Prepare Your PHP Script:

    • Open a text editor (like Notepad++ or Visual Studio Code).
    • Copy the PHP script above into the editor.
    • Save the file as scrape_proxies.php in the htdocs directory of your XAMPP installation (e.g., C:\xampp\htdocs\ on Windows, /Applications/XAMPP/htdocs/ on macOS).
  4. Run the PHP Script:

    • Open your web browser.
    • Navigate to http://localhost/scrape_proxies.php.
    • The script will fetch proxies, check them, remove duplicates, and write the valid ones to http.txt, socks4.txt, and socks5.txt.
  5. Access the Proxy Lists:

    • HTTP proxies: http://localhost/http.txt.
    • SOCKS4 proxies: http://localhost/socks4.txt.
    • SOCKS5 proxies: http://localhost/socks5.txt.
1 Like

If you want I can add a checkbox to dedupe proxies after loading them from the sources, just open an issue on github and that seems like a very reasonable thing to add.

2 Likes

Yeah i was searching for removing the duplicates. but couldnt find in ob2.
Ob1 has it.

1 Like

yeah thanks will be a good thing right
already added it there on github as you told me

2 Likes