I use OB2 to download PDF files. Everything is good, I can save the files, but then I want to convert them into TXT file to parse them, so I tried a shell command pdftotext :
BLOCK:FileWriteBytes
LABEL:Facture
path = $"Factures\\<CLEDOC[0]>\\<elem>.pdf"
content = @data.RAWSOURCE
ENDBLOCK
BLOCK:ShellCommand
executable = $"pdftotext \".\\Factures\\<CLEDOC[0]>\\<elem>.pdf\" - | sed -n 's/.*\\([fF][rR][0-9][0-9]\\(\\s*\\w\\s*\\)\\{23\\}\\).*/\\1/p' | sed 's/\\s*//g"
=> CAP @shellCommandOutput
ENDBLOCK
Then I got this error :
>> Facture (FileWriteBytes) <<
Wrote bytes to Factures\20210626013018180GNXXSCCFAEAU28050474\20210626013018180GNXXSCCFAEAU28050474.pdf
>> Shell Command (ShellCommand) <<
[Executing block Shell Command] Win32Exception: Le fichier spécifié est introuvable.
BOT ENDED AFTER 28308 ms WITH STATUS: ERROR
It says the PDF file can’t be found, whereas the variables CLEDOC[0] and elem are well-defined, and the file downloaded as well, as you can see here :
Wrote bytes to Factures\20210626013018180GNXXSCCFAEAU28050474\20210626013018180GNXXSCCFAEAU28050474.pdf
Executable needs to be the full path to the executable, not relative path.
For example /usr/bin/pdftotext and then the argument is whatever comes after the pdftotext in your command. Otherwise it will not work.
I think there is a way for me to support also just the name of the program if it’s in the PATH env variable so if you want me to add support for this please open an issue on github.
Thanks for the help. I copied/pasted your blocks, and still the same error …
>> Facture (FileWriteBytes) <<
Wrote bytes to Factures\20210619043034340GNXXSCCFAEAU21061825\20210619043034340GNXXSCCFAEAU21061825.pdf
>> Shell Command (ShellCommand) <<
[Executing block Shell Command] Win32Exception: Le fichier spécifié est introuvable.
BOT ENDED AFTER 22386 ms WITH STATUS: ERROR
I don’t have issue with finding PyPDF2 module anymore but now I face issue with importing variables from OB …
[IDLE] CompilationErrorException: (106,32): error CS1525: Invalid expression term '<'
Also, I would like output the variable “text” from the script : it’s ok to put “text” in the output field of OB (as string variable), will it recognize ? Or do I need a “return” at the end of the script ? If so, what will be the complete synthax please ? I don’t know anything in Py nor C# …
thanks! what about the ouput ? I need the variable “text” inside the script. If I write “text” in the output field, will it return “text” from the script ?
I tested what you said, but it’s like the two variables are not recognized
>> Facture (FileWriteBytes) <<
Wrote bytes to Factures\20210622023113130GNXXSCCFAEAU22124802\20210622023113130GNXXSCCFAEAU22124802.pdf
>> Script (GetIronPyScope) <<
Getting a new IronPython scope.
[Executing block Script] DirectoryNotFoundException: Could not find a part of the path 'C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\cledoc0\elem.pdf'.
Well I don’t know very well how string interpolation works in python but I’m pretty sure you need to use the two variables that you injected in the script so something like
I said I don’t know the python syntax you have to look it up yourself but I’m sure the way you were using before was wrong because you were not using the variables, you wrote a constant string… Please search how to interpolate strings in python or just use the + operator to join them.
>> Script (GetIronPyScope) <<
Getting a new IronPython scope.
Executed IronPython script with result
C:\Users\Kurosagi\Downloads\OpenBullet2\OpenBullet2\Factures\blabla\bibi.pdf
BOT ENDED AFTER 2335 ms WITH STATUS: NONE
So it works ! But when I use it in my script, I got this new error (I don’t know anything in Py may I recall) :
Wrote bytes to Factures\20200810152116000ARCHSCCFAEAU23009811\20200810152116000ARCHSCCFAEAU23009811.pdf
>> Script (GetIronPyScope) <<
Getting a new IronPython scope.
[Executing block Script] IndexOutOfRangeException: index out of range: 3
BOT ENDED AFTER 17603 ms WITH STATUS: ERROR