feat(TaskProcessing): Add OCR TaskType#56908
Conversation
bca2b42 to
e339591
Compare
|
Looks good! We could add an input for the language to be extracted and have it default to automatic detection, or add that as optional input only for providers that make use of it. Both fine for me, wdyt @julien-nc @kyteinsky ? |
|
Not sure the OCR libraries take a "language" param to help them perform an optimal extraction. @marcelklehr Do they? |
The latest models don't require a language input, but older libraries like tesseract may require this. I think an optional input is fine. |
e339591 to
483a4b2
Compare
483a4b2 to
42bf379
Compare
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
42bf379 to
3355e6a
Compare
| public function getInputShape(): array { | ||
| return [ | ||
| 'input' => new ShapeDescriptor( | ||
| $this->l->t('Input Image'), | ||
| $this->l->t('The image to extract text from'), | ||
| EShapeType::Image | ||
| ), | ||
| ]; | ||
| } |
There was a problem hiding this comment.
it would be nice if it were a ListOfFiles so it can accept images and pdfs both, and multiple of them instead of a single one for a single task, which also keeps the task list shorter in the DB.
|
New public API (interface and classes in OCP) need to be mentioned here: |
see nextcloud/server#56908 see nextcloud/server#56717 Signed-off-by: Marcel Klehr <mklehr@gmx.net>
see nextcloud/server#56908 see nextcloud/server#56717 Signed-off-by: Marcel Klehr <mklehr@gmx.net> [skip ci]
see nextcloud/server#56908 see nextcloud/server#56717 Signed-off-by: Marcel Klehr <mklehr@gmx.net>
Summary
Adds a task processing task type for doing OCR
TODO
Checklist
3. to review, feature component)stable32)