OCR text and handwritten forms using Captricity. Captricity’s big
advantage over Abbyy Cloud
OCR is that it allows the user to easily specify the position of
text-blocks that want to OCR; they have a simple web-based UI. The
quality of the OCR can be checked using compare_txt
from recognize.
To get the latest version on CRAN:
install.packages("captr")
To get the current development version from GitHub:
install.packages("devtools")
::install_github("soodoku/captr", build_vignettes = TRUE) devtools
### Using captr |
Read the vignette: |
or follow the overview below. |
Start by getting an application token and setting it using: |
r set_token("token") |
Then, create a batch using: |
r create_batch("batch_name") |
Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI. |
r set_template_id("id") |
Next, assign the template ID to a batch: |
Next, upload image(s) to a batch |
Next, check whether the batch is ready to be processed: |
r test_readiness(batch_id="batch_id") |
You may also want to find out how much would processing the batch set you back by: |
r batch_price(batch_id="batch_id") |
Once you are ready, submit the batch: |
Captricity excels in nomenclature
confusion. So once a batch is submitted, it is then called a job. The id
for the job can be obtained from the list that is returned from
submit_batch . The field name is
related_job_id . |
To track progress of a job, use: |
r track_progress(job_id ="job_id") |
List all forms (instance sets) associated with a job: |
If you want to download data from a
particular form, use the list_instance_sets to get the form
(instance_set) id and run: |
Get csv of all your results from a job: |
Scripts are released under the MIT License.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.