Search millions of documents stored in many different sources?
Deltares is a large, international technology institute in the fields of water, subsurface and infrastructure. Deltares has various sources with millions of documents, and every day hundreds more are added.
To start with, we sat down with Deltares to make the biggest, most important source accessible: a document share involving more than 15 million documents. We looked at what kinds of documents are present and what metadata is linked to these documents. Examples include publication date, author, title, file extension but also of course simply the content of a document. We advised Deltares on which data to use for the best search experience, but also for easiest finding.
On the basis of this data, we first designed the user interface, the pages you can search for and find. Such a user interface is always 100% tailor-made, taking into account the house style, for example colors, fonts, etc. If necessary, we design a digital house style if this does not yet exist.
We then crawled all the documents and gathered all the relevant (meta)data and wrote this to an Elastisearch index. We then developed the underlying APIs required, to allow smart, targeted questions to be put from the search front end (in JSON), to which we can display the answers (again in JSON). On the basis of this, answers could then be presented in the front end.
In the case of Deltares, we also linked our search solution to the Active Directory, as all documents are subject to rights. This means we know who is asking what question and which answers we can and
cannot display in the results.
An Enterprise Search Platform based on Elasticsearch: Confind
- Hosting by Deltares;
- Monitoring and management by Smartshore & Ability;
- Admin Module for functional management;
- Kibana for consumption insights;
- Frontend based on Vue.js and Nuxt;
- APIs for search, suggestions and indexations, etc.;
- Integration with Active Directory.