RESTful API

Nidaba includes a RESTful API server and an experimental web user interface. To start up the server locally just run:

$ nidaba api_server

To create batches remotely you can use the normal nidaba commands by adding the -h/--host option:

$ nidaba batch -h http://127.0.0.1:8080/api/v1 --grayscale -l tesseract -o tesseract:languages=eng,extended=True -- input.tif

or:

$ nidaba status -h http://127.0.0.1:8000/api/v1 cf644c49-01b9-44e3-82fc-a4073f0980ef

Schema

All data is sent and received as JSON.

Client Errors

HTTP Verbs

Where possible, the API strives to use appropriate HTTP verbs for each action.

API Reference

GET /api/v1/tasks/(group)/(task)
GET /api/v1/tasks/(group)
GET /api/v1/tasks

Retrieves the list of available tasks, their arguments and valid values for those arguments.

** Request **

GET /tasks

** Response **

HTTP/1.1 200 OK

{
    "img": {
        "deskew": {}, 
        "dewarp": {}, 
        "rgb_to_gray": {}
    },
    "binarize": {
        "nlbin": {
            "border": "float", 
            "escale": "float", 
            "high": [
                0, 
                100
            ], 
            "low": [
                0, 
                100
            ], 
        }, 
        "otsu": {}, 
        "sauvola": {
            "factor": [
                0.0, 
                1.0
            ], 
            "whsize": "int"
        }
    },
    "segmentation": {
        "kraken": {}, 
        "tesseract": {}
    },
    "ocr": {
        "kraken": {
            "model": [
                "fraktur.pyrnn.gz", 
                "default", 
                "teubner"
            ]
        }, 
        "tesseract": {
            "extended": [
                false, 
                true
            ], 
            "languages": [
                "chr", 
                "chi_tra", 
                "ita_old", 
                "ceb", 
            ]
        }
    }, 
    "postprocessing": {
        "spell_check": {
            "filter_punctuation": [
                true, 
                false
            ], 
            "language": [
                "latin", 
                "polytonic_greek"
            ]
        }
    },
    "output": {
        "metadata": {
            "metadata": "file", 
            "validate": [
                true, 
                false
            ]
        }, 
        "tei2hocr": {}, 
        "tei2simplexml": {}, 
        "tei2txt": {}
    }
}

It is also possible to retrieve only a subset of task definitions by adding to the request a task group and/or the task name:

** Request **

GET /tasks/segmentation

** Response **

HTTP/1.1 200 OK

{
    "segmentation": {
        "kraken": {}, 
        "tesseract": {}
    }
}

Currently there are 4 different argument types:

  • “int”: An integer

  • “float”: A float (floats serialized to integers, i.e. 1.0 to 1

    are also accepted)

  • “str”: An UTF-8 encoded string

  • “file”: A file on the storage medium, referenced by its URL

Finally there are lists of valid argument values where one or more values out of the list may be picked and value ranges

POST /api/v1/batch

Creates a new batch and returns it identifier.

** Request **

POST /batch

** Response **

HTTP/1.1 201 CREATED

{
    "id": "78a1f1e4-cc76-40ce-8a98-77b54362a00e", 
    "url": "/batch/78a1f1e4-cc76-40ce-8a98-77b54362a00e"
}
Status Codes:
POST /api/v1/batch/(batch_id)/tasks/(group)/(task)
POST /api/v1/batch/(batch_id)/tasks/(group)
POST /api/v1/batch/(batch_id)/tasks

Adds a particular configuration of a task to the batch identified by batch_id.

** Request **

POST /batch/:batch_id/:group/:task

{
kwarg_1: “value”, kwarg_2: 10, kwarg_3: ‘true’, kwarg_4: [“a”, “b”], kwarg_5: ‘/pages/:batch_id/path’

}

** Response **

HTTP/1.1 201 CREATED

To post files as arguments use their URL returned by the call that created them on the batch. Booleans are strings containing either the values ‘True’/’true’ or ‘False’/’false’.

Status Codes:
GET /api/v1/batch/(batch_id)/tasks/(group)/(task)
GET /api/v1/batch/(batch_id)/tasks/(group)
GET /api/v1/batch/(batch_id)/tasks

Retrieves the list of tasks and their argument values associated with a batch, optionally limited to a specific group.

** Request **

GET /batch/:batch_id/tasks    

** Response **

HTTP/1.1 200 OK

{
    "segmentation": [
        ["tesseract", {}]
    ],
    "ocr": [
        ["kraken", 
            {
                "model": "teubner", 
            }
        ]
    ]
}

To limit output to a specific group of tasks, e.g. segmentation or binarization append the group to the URL:

** Request **

GET /batch/:batch_id/tasks/:group

** Response **

HTTP/1.1 200 OK

{
    'group': [
        ["tesseract", {}],
        ["kraken", {}]
    ]
}
Status Codes:
POST /api/v1/batch/(batch_id)/pages

Adds a page (really any type of file) to the batch identified by batch_id.

** Request **

POST /batch/:batch/pages

** Response **

HTTP/1.1 201 OK

[
{
“name”: “0033.tif”, “url”: “/pages/63ca3ec7-2592-4c7d-9009-913aac42535d/0033.tif”

}

]

Form Parameters:
 
  • scans – file(s) to add to the batch
Status Codes:
GET /api/v1/batch/(batch_id)/pages

Returns the list of pages associated with the batch with batch_id.

** Request **

GET /batch/:batch/pages

** Response **

HTTP/1.1 200 OK

[
    {
        "name": "0033.tif", 
        "url": "/pages/63ca3ec7-2592-4c7d-9009-913aac42535d/0033.tif"
    }, 
    {
        "name": "0072.tif", 
        "url": "/pages/63ca3ec7-2592-4c7d-9009-913aac42535d/0072.tif"
    }, 
    {
        "name": "0014.tif", 
        "url": "/pages/63ca3ec7-2592-4c7d-9009-913aac42535d/0014.tif"
    }
]
Status Codes:
GET /api/v1/pages/(batch)/(path: file)

Retrieves the file at file in batch batch.

** Request **

GET /pages/:batch/:path

** Response **

HTTP/1.1 200 OK
Content-Type: application/octet-stream

...
Parameters:
  • batch (str) – batch’s unique id
  • file (path) – path to the batch’s file
Status Codes:
POST /api/v1/batch/(batch_id)

Executes batch with identifier batch_id

** Request **

POST /batch/:batch_id

** Response **

HTTP/1.1 202 ACCEPTED
Parameters:
  • batch_id (string) – batch’s unique id
Status Codes:
GET /api/v1/batch/(batch_id)

Retrieves the state of batch batch_id.

** Request **

GET /batch/:batch_id

** Response **

HTTP/1.1 200 OK
Parameters:
  • batch_id (string) – batch identifier
Status Codes: