Parsing and solving a numeric padded captcha

Hi guys.

Have you ever encountered a captcha like this?

immagine

In spite of its looks, this is not actually an image, in fact you can highlight the numbers on the page.
It uses a different padding-left to display the digits in an order that is different from the one in the source code.

To solve this captcha we need to:

  • Parse the padding-left values to a list
  • Sort them numerically from the lowest to the highest
  • Use them to parse the HTML entities from the page (one by one in the correct order)
  • Join and decode the resulting HTML entity string

Here’s how the LoliCode looks

data.SOURCE = "<td align=right><div style='width:80px;height:26px;font:bold 13px Arial;background:#ccc;text-align:left;direction:ltr;'><span style='position:absolute;padding-left:60px;padding-top:5px;'>&#55;</span><span style='position:absolute;padding-left:26px;padding-top:7px;'>&#51;</span><span style='position:absolute;padding-left:42px;padding-top:5px;'>&#52;</span><span style='position:absolute;padding-left:10px;padding-top:5px;'>&#54;</span></div></td>";

BLOCK:Parse
  input = @data.SOURCE
  pattern = "padding-left:([0-9]+)px[^>]*>(&#[0-9]+;)"
  outputFormat = "[1]"
  RECURSIVE
  MODE:Regex
  => VAR @NUMBERS
ENDBLOCK

BLOCK:SortList
  list = @NUMBERS
  numeric = True
ENDBLOCK

BLOCK:Parse
  input = @data.SOURCE
  leftDelim = "padding-left:<NUMBERS[0]>px[^>]*>(&#[0-9]+;)"
  pattern = $"padding-left:<NUMBERS[0]>px[^>]*>(&#[0-9]+;)"
  outputFormat = "[1]"
  MODE:Regex
  => VAR @NUM1
ENDBLOCK

BLOCK:Parse
  input = @data.SOURCE
  leftDelim = "padding-left:<NUMBERS[0]>px[^>]*>(&#[0-9]+;)"
  pattern = $"padding-left:<NUMBERS[1]>px[^>]*>(&#[0-9]+;)"
  outputFormat = "[1]"
  MODE:Regex
  => VAR @NUM2
ENDBLOCK

BLOCK:Parse
  input = @data.SOURCE
  leftDelim = "padding-left:<NUMBERS[0]>px[^>]*>(&#[0-9]+;)"
  pattern = $"padding-left:<NUMBERS[2]>px[^>]*>(&#[0-9]+;)"
  outputFormat = "[1]"
  MODE:Regex
  => VAR @NUM3
ENDBLOCK

BLOCK:Parse
  input = @data.SOURCE
  leftDelim = "padding-left:<NUMBERS[0]>px[^>]*>(&#[0-9]+;)"
  pattern = $"padding-left:<NUMBERS[3]>px[^>]*>(&#[0-9]+;)"
  outputFormat = "[1]"
  MODE:Regex
  => VAR @NUM4
ENDBLOCK

BLOCK:DecodeHTMLEntities
  input = $"<NUM1><NUM2><NUM3><NUM4>"
  => VAR @SOLUTION
ENDBLOCK

The SOLUTION variable will contain the value 6347 which is the correct order of digits we wanted.
Note: this will only work in OB2 version 0.1.11+ due to a bug present in earlier versions.

Ruri

4 Likes