Procura de Parceiros de I&D

Eurostars / Eureka Partner search: R&D institution or SME used to deal with Optical Character Recognition engines for providing both high level requirements and fine grained technical details

Para saber mais acerca deste perfil, deve efectuar o login, ou registar-se no portal.

Diffusion

International

Reference

56519

International Ref.

RDES20170821001

Deadline

24-08-2018

Abstract

Spanish SME specialised in enterprise content management (ECM) and document capture is looking to identify suitable SMEs or R&D institutions to join an Eurostars proposal to the next cut-off deadline (14th September 2017). The partner has to be used to deal with Optical Character Recognition (OCR) engines for providing both high level requirements. The expected outcome of the project is to go beyond what current OCR engines are offering in the market.

Description

The project proposes the development of new fault tolerant open source OCR/ICR solution comparable with the features provided by some of the commercial ones. Therefore providing trustable extraction from documents with any layout, typography and/or writing style (both type and handwritten).

The enterprise content management (ECM) and the document capture markets are definitely in need of innovative solutions able to relieve human work. The ECM market is expected to move 12 billions dollars for 2019. It is a huge market because organizations at any size need require of some kind of content management solution. On the other hand, the document capture market, closely related to ECM, growth last year more than a 6% reaching a volume of 2 billions dollars.

Having at hand powerful, flexible, easy-adaptable, extensible OCR-ICR tools has become a fundamental problem for many organizations in their document management processes. Current tools for OCR are mainly based in techniques that were developed in the past twenty years in the pattern recognition (PR) field, like support vector machines, or simply nearest neighbour techniques. These tools do not take profit in many cases of contextual information for improving the OCR results. This contextual information consists mainly in linguistic resources like vocabularies and lexicons that reside usually in the companies that make use of these tools. With this project, the company intends to develop a series of OCR tools that will be accessible as open source that will take profit of these linguistic resources for building language models and to deal with difficult documents.

In recent years a new technology that is based in deep learning techniques, has strongly emerged in many Pattern Recognition problems including handwritten text recognition (HTR). These powerful techniques can be extended easily to implement OCR systems in order to deal with difficult documents and some research teams intend to develop these tools in the near future. One advantage of these deep learning techniques is that they are able to classify very quickly the sample to be recognized. Another advantage is that there exist many basic open source tools for performing the core processing. With this project, the company intends to develop a series of tools based on free software both for generating automatically training data and for training an OCR system based on deep learning techniques.

The call that the company is targeting is a Eurostars cut off 8, deadline is 14/09/2017. As the deadline is very close, the company is also considering to prepare a proposal for next Eurostars cut off deadline or an Eureka network project (always open).
EOI deadline: 07 September 2017
Project duration: 1,5 years (aprox).

Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. It should have expertise in:
• Natural Language Processing
• Image Processing
• OCR Technologies
• Artificial Intelligence
• Big Data

Innovative Aspects and Main offer advantages

The project intends to go beyond the state-of-the-art by developing tools for recognizing printed documents with handwritten text recognition (HTR) techniques that we will name from now on as OCR-HTR techniques. These techniques should be effective for printed documents for which current OCR techniques are not able to obtain good results. The main foundations of these tools will be: i) the technology will be based on a combination of Deep Neural Network Hidden Markov Models (DNN-HMM) for optical modeling; ii) n-gram models will be used in a recognition/decoding system; iii) Words generation and indices preparation will be implemented for making the collections of printed documents searchable.

The main technological outcome of the project include contributions to the Open Source Community:

- Industry-ready OCR and ICR with permissive open source (BSD, MIT or ASL 2.0 licensed) components that would perform as good as far more expensive commercial products.

- An API for OCR-HTR system tailored to be used by Content Management Systems and full text retrieval systems especially Apache Lucene / Solr a and Elasticsearch.

Type of partner sought

Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. The parnet should have expertise in:
• Natural Language Processing
• Image Processing
• OCR Technologies
• Artificial Intelligence
• Big Data

Specific area of activity of the partner

Part of the technical development will be contracted with an University R&D Group with whom the company already has built an initial prototype. They are now seeking a technical and/or commercial partner with similar uses cases or interested in the expected outcomes of the project beyond what current OCR engines are offering in the market. Ideal partner is an SME in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform.

Therefore, the company is seeking for an integrator partner used to deal with OCR engines for providing both high level requirements and fine grained technical details about the new engine capabilities. A concrete use case around OCR/ICR/HTR technologies is preferred, ideally a use case where they find a barrier in the current state of the art in this technologies.

Task to be performed

SME 11-50,R&D Institution,SME 51-250
pdf

Perfis semelhantes

Eurostars project: Smart IT platform for profiling digital forensics crimes. EUROSTARS proposal: Monitoring and protection system for critical wired and wireless communication infrastructures. EUREKA/Eurostars: agriculture company that provides post-harvest services or post-harvest treatments Eurostars: looking for vocational education training organisations, companies buying vocational education sessions and social network of employment industry Eurostars: looking for vocational education training organisations, companies buying vocational education sessions and social network of employment industry EUREKA/Eurostars: expertise in the development and validation of real-time quantitative PCR (rt-QPCR) diagnostic assays with in vitro diagnostic (IDV) label for human diagnosis sought
A reconstruir o índice.
Aguarde por favor...