Config throws 'invalid' error, what am I doing wrong?

Hey guys.
I made a config to scrape startpage.com for search results & to extract the results urls.
GIF of the issue: https://i.imgur.com/BY0w9iS.gif

Lolicode I used:

BLOCK:PuppeteerOpenBrowser
ENDBLOCK

BLOCK:PuppeteerNavigateTo
  url = "https://www.startpage.com/do/mypage.pl?prfe=bac38c6c11849a35192e23eed03e5cd58ca7a0a992c7d66dde9b968457e6b8d1d7f6052df69df20e79ae9492d6295da9c9e9b0cef9ac1fcb337ca4f9701e590fad8f8c6ed796976708f95c8729"
  referer = "https://www.startpage.com"
ENDBLOCK

BLOCK:PuppeteerTypeElement
  findBy = XPath
  identifier = "//*[@id=\"q\"]"
  text = $"<input.USER>"
  timeBetweenKeystrokes = 10
ENDBLOCK

BLOCK:PuppeteerClick
  findBy = XPath
  identifier = "/html/body/div[2]/section/div[2]/div[2]/div/form/button[2]/div/div"
ENDBLOCK

BLOCK:PuppeteerWaitForNavigation
ENDBLOCK

BLOCK:PuppeteerGetAttributeValueAll
  findBy = Class
  identifier = "w-gl__result-url result-link"
  attributeName = "href"
  => VAR @puppeteerGetAttributeValueAllOutput
ENDBLOCK

BLOCK:Keycheck
  banIfNoMatch = False
  KEYCHAIN SUCCESS OR
    STRINGKEY @puppeteerGetAttributeValueAllOutput Contains "https://"
  KEYCHAIN FAIL OR
    STRINGKEY @puppeteerGetAttributeValueAllOutput DoesNotExist "https://"
ENDBLOCK

BLOCK:Parse
  input = @puppeteerGetAttributeValueAllOutput
  RECURSIVE
  MODE:LR
  => VAR @parseOutput
ENDBLOCK

BLOCK:RegexReplace
  original = @parseOutput
  pattern = "(\\[\\[|\\]\\])"
  => VAR @regexReplaceOutput
ENDBLOCK

BLOCK:FileAppendLines
  path = "startpage.com.txt"
  lines = @regexReplaceOutput
ENDBLOCK

BLOCK:Keycheck
  banIfNoMatch = False
  KEYCHAIN SUCCESS AND
    STRINGKEY @regexReplaceOutput Contains "https"
    STRINGKEY @regexReplaceOutput Contains ","
  KEYCHAIN FAIL AND
    STRINGKEY @regexReplaceOutput Contains "https"
    STRINGKEY @regexReplaceOutput Contains ","
ENDBLOCK


Check your datalist at the time of loading on runner. Make sure you selected credential type.

1 Like

I already did. I also checked if the datalist is in utf 8.

@Marie disable the headless mode, then run the config to get the idea of what causing the problem

1 Like

Your data type is credentials while the actual data is not in that format, did you already account for that?

1 Like

When in Stacker, the browser opens & the <input.USER> gets used as it’s intended.
But when I run the config as a job, nothing happens, besides of massive invalid errors.
Please see the attached gif.
https://i.imgur.com/BY0w9iS.gif

You can try this, with your datalist as DEFAULT TYPE.

BLOCK:RegexReplace
LABEL:DATA
original = @input.DATA
pattern = “\s”
replacement = “+”
=> VAR @DATA
ENDBLOCK

BLOCK:HttpRequest
LABEL:SEARCH
url = “Startpage - Private Search Engine. No Tracking. No Search History.”
method = POST
httpLibrary = SystemNet
customHeaders = {(“Host”, “www.startpage.com”), (“Origin”, “https://www.startpage.com”), (“Referer”, “https://www.startpage.com/”), (“User-Agent”, “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47”)}
TYPE:STANDARD
$“query=&language=italiano&lui=english&cat=web&sc=9Ud1SX69ew1c20&abp=-1”
“application/x-www-form-urlencoded”
ENDBLOCK

BLOCK:Parse
LABEL:URLS
input = @data.SOURCE
attributeName = “”
pattern = “class="w-gl__result-url.result-link"\s+?href="(.+?)"”
outputFormat = “[1]”
RECURSIVE
MODE:Regex
=> VAR @URLS
ENDBLOCK

BLOCK:Keycheck
LABEL:CHECK
banIfNoMatch = False
KEYCHAIN FAIL OR
STRINGKEY @URLS EqualTo “”
ENDBLOCK

BLOCK:FileAppend
LABEL:APPEND
path = “Startpage.txt”
content = @URLS
ENDBLOCK

1 Like

Sir, your brain must be enormous. You solved my issue. Thank you so much. I would lick your armpits for that.

1 Like

Just drop my reply a like, pls dont lick anything :slight_smile:

1 Like

You can add a SUCCESS KEY as you prefer.

image

1 Like