Shell code : can't find my file!

Hello,

I use OB2 to download PDF files. Everything is good, I can save the files, but then I want to convert them into TXT file to parse them, so I tried a shell command pdftotext :

BLOCK:FileWriteBytes

LABEL:Facture

  path = $"Factures\\<CLEDOC[0]>\\<elem>.pdf"

  content = @data.RAWSOURCE

ENDBLOCK

BLOCK:ShellCommand

  executable = $"pdftotext \".\\Factures\\<CLEDOC[0]>\\<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"

  => CAP @shellCommandOutput

ENDBLOCK

Then I got this error :

>> Facture (FileWriteBytes) <<

Wrote bytes to Factures\20210626013018180GNXXSCCFAEAU28050474\20210626013018180GNXXSCCFAEAU28050474.pdf

 

>> Shell Command (ShellCommand) <<

[Executing block Shell Command] Win32Exception: Le fichier spécifié est introuvable.

BOT ENDED AFTER 28308 ms WITH STATUS: ERROR

It says the PDF file can’t be found, whereas the variables CLEDOC[0] and elem are well-defined, and the file downloaded as well, as you can see here :

Wrote bytes to Factures\20210626013018180GNXXSCCFAEAU28050474\20210626013018180GNXXSCCFAEAU28050474.pdf

I tried this

executable = $"pdftotext \"/mnt/c/Users/Admin/Downloads/OpenBullet2/OpenBullet2/Factures/<CLEDOC[0]>/<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"

(this is the path I get when I open a shell windows in the < elem > folder)

But I got the same error whereas the file appears in the folder …

Nobody can help me ? I’m stuck, I can’t continue my lolicode …

Executable needs to be the full path to the executable, not relative path.
For example /usr/bin/pdftotext and then the argument is whatever comes after the pdftotext in your command. Otherwise it will not work.
I think there is a way for me to support also just the name of the program if it’s in the PATH env variable so if you want me to add support for this please open an issue on github.

“[IDLE] CompilationErrorException: (90,13): error CS1026: ) expected”

means error lies in 90th line, right ?

It’s on the 90th line of the generated C# code. Please head to the C# tab and go to line 90 to find out the problem.

which pdftotext

gives me

/usr/bin/pdftotext

So, I typed

BLOCK:ShellCommand

  executable = $"\\usr\\bin\\pdftotext \"Factures\\<CLEDOC[0]>\\<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"

  => VAR @IBAN

ENDBLOCK

Same error

I also tested

BLOCK:ShellCommand

  executable = $"/usr/bin/pdftotext \"/Factures/<CLEDOC[0]>/<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"

  => VAR @IBAN

ENDBLOCK

Same sh*t, file is still here in the directory !

I used full path for the pdf file, and same error as well

BLOCK:ShellCommand

  executable = $"/usr/bin/pdftotext \"/mnt/c/Users/Kurosagi/Downloads/OpenBullet2/OpenBullet2/Factures/<CLEDOC[0]>/<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"

  => VAR @IBAN

ENDBLOCK

You are putting everything inside the executable, while you shouldn’t do that since the stuff after pdftotext are arguments. Please write like this.

BLOCK:ShellCommand
  executable = "/usr/bin/pdftotext"
  arguments = $"\"/mnt/c/Users/Kurosagi/Downloads/OpenBullet2/OpenBullet2/Factures/<CLEDOC[0]>/<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"
  => VAR @IBAN
ENDBLOCK

Thanks for the help. I copied/pasted your blocks, and still the same error …

>> Facture (FileWriteBytes) <<

Wrote bytes to Factures\20210619043034340GNXXSCCFAEAU21061825\20210619043034340GNXXSCCFAEAU21061825.pdf

 

>> Shell Command (ShellCommand) <<

[Executing block Shell Command] Win32Exception: Le fichier spécifié est introuvable.

BOT ENDED AFTER 22386 ms WITH STATUS: ERROR

I honestly don’t know, maybe try the python way with the suggestions you received and see if that one is working correctly.

I don’t have issue with finding PyPDF2 module anymore but now I face issue with importing variables from OB …

[IDLE] CompilationErrorException: (106,32): error CS1525: Invalid expression term '<'

Also, I would like output the variable “text” from the script : it’s ok to put “text” in the output field of OB (as string variable), will it recognize ? Or do I need a “return” at the end of the script ? If so, what will be the complete synthax please ? I don’t know anything in Py nor C# …

c# :

// BLOCK: Script
data.ExecutingBlock("Script");
var tmp_ygwtxq = GetIronPyScope(data);
tmp_ygwtxq.SetVariable(nameof( < CLEDOC[0] > ), < CLEDOC[0] > );
tmp_ygwtxq.SetVariable(nameof( < elem > ), < elem > );
ExecuteIronPyScript(data, tmp_ygwtxq, "Scripts/dcca59f7b1c9d36ca60b40aa758c777e.py");
string text = tmp_ygwtxq.GetVariable<string>("text");

lolicode :

BLOCK:Script
INTERPRETER:IronPython
INPUT <CLEDOC[0]>, <elem>
BEGIN SCRIPT
import sys
sys.path.append(r"C:\Users\Kurosagi\AppData\Local\Programs\Python\Python39")
import PyPDF2
 
pdffileobj=open(r"C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\<CLEDOC[0]>\<elem>.pdf",'rb')
 
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
 
x=pdfreader.numPages
 
pageobj=pdfreader.getPage(x+1)
 
text=pageobj.extractText()
END SCRIPT
OUTPUT String @text
ENDBLOCK

In the input field you don’t have to put < and > and you cannot put indices.
Please do the following.

string cledoc0 = CLEDOC[0];
BLOCK:Script
INTERPRETER:IronPython
INPUT cledoc0,elem
BEGIN SCRIPT
etc...

and in the python script you can use the variables cledoc0 and elem.

thanks! what about the ouput ? I need the variable “text” inside the script. If I write “text” in the output field, will it return “text” from the script ?

I tested what you said, but it’s like the two variables are not recognized

>> Facture (FileWriteBytes) <<

Wrote bytes to Factures\20210622023113130GNXXSCCFAEAU22124802\20210622023113130GNXXSCCFAEAU22124802.pdf

 

>> Script (GetIronPyScope) <<

Getting a new IronPython scope.

[Executing block Script] DirectoryNotFoundException: Could not find a part of the path 'C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\cledoc0\elem.pdf'.
string cledoc0 = CLEDOC[0];

BLOCK:Script

INTERPRETER:IronPython

INPUT cledoc0, elem

BEGIN SCRIPT

import sys

sys.path.append(r"C:\Users\Kurosagi\AppData\Local\Programs\Python\Python39")

import PyPDF2

 

pdffileobj=open(r"C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\cledoc0\elem.pdf",'rb')

 

pdfreader=PyPDF2.PdfFileReader(pdffileobj)

 

x=pdfreader.numPages

 

pageobj=pdfreader.getPage(x+1)

 

text=pageobj.extractText()

END SCRIPT

OUTPUT String @text

ENDBLOCK

Well I don’t know very well how string interpolation works in python but I’m pretty sure you need to use the two variables that you injected in the script so something like

string cledoc0 = CLEDOC[0];
BLOCK:Script
INTERPRETER:IronPython
INPUT cledoc0, elem
BEGIN SCRIPT
import sys
sys.path.append(r"C:\Users\Kurosagi\AppData\Local\Programs\Python\Python39")
import PyPDF2
pdffileobj=open(f"C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\{cledoc0}\{elem}.pdf",'rb')
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
x=pdfreader.numPages
pageobj=pdfreader.getPage(x+1)
text=pageobj.extractText()
END SCRIPT
OUTPUT String @text
ENDBLOCK

I changed the line with the path to the pdf file

I tested and now with your newline and I get :

>> Script (GetIronPyScope) <<

Getting a new IronPython scope.

[Executing block Script] SyntaxErrorException: invalid syntax

:frowning:

I said I don’t know the python syntax you have to look it up yourself but I’m sure the way you were using before was wrong because you were not using the variables, you wrote a constant string… Please search how to interpolate strings in python or just use the + operator to join them.

string cledoc0 = "blabla" ;

string elem = "bibi" ;

BLOCK:Script

INTERPRETER:IronPython

INPUT cledoc0,elem

BEGIN SCRIPT

import sys

sys.path.append(r"C:\Users\Kurosagi\AppData\Local\Programs\Python\Python39")

import PyPDF2

path = "C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\{}\{}.pdf".format(cledoc0, elem)

END SCRIPT

OUTPUT String @path

ENDBLOCK

LOG path

gives

>> Script (GetIronPyScope) <<

Getting a new IronPython scope.

Executed IronPython script with result

C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\blabla\bibi.pdf

BOT ENDED AFTER 2335 ms WITH STATUS: NONE

So it works ! But when I use it in my script, I got this new error (I don’t know anything in Py may I recall) :


 Wrote bytes to Factures\20200810152116000ARCHSCCFAEAU23009811\20200810152116000ARCHSCCFAEAU23009811.pdf

>> Script (GetIronPyScope) <<

Getting a new IronPython scope.

[Executing block Script] IndexOutOfRangeException: index out of range: 3

BOT ENDED AFTER 17603 ms WITH STATUS: ERROR

with this new script :

string cledoc0 = CLEDOC[0];

BLOCK:Script

INTERPRETER:IronPython

INPUT cledoc0,elem

BEGIN SCRIPT

import sys

sys.path.append(r"C:\Users\Kurosagi\AppData\Local\Programs\Python\Python39")

import PyPDF2

path = "C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\{}\{}.pdf".format(cledoc0, elem)

pdffileobj=open(path,'rb')

pdfreader=PyPDF2.PdfFileReader(pdffileobj)

x=pdfreader.numPages 

pageobj=pdfreader.getPage(x+1) 

text=pageobj.extractText()

END SCRIPT

OUTPUT String @text

ENDBLOCK

LOG text