Client needed the tool, which can efficiently parse different structured entities from unstructured OCR recognized text. Deliverables:
- Entity names parser
- Address parser
- Date parser
- Quantities and measures parser
- Money parser
- Efficient mechanist to minimize false detections
- Mechanism for automated labelling of the training dataset
Technology stack:
Python 2.7, Tensorflow, Stanford core NLP, Spacy, Grobid, datefinder, nltk, quantulum.