RELIANCE Text Mining Services in EOSC

RELIANCE Text Mining Services in EOSC

RELIANCE text mining and enrichment services integrated in EOSC enable researchers from Earth Science Communities and Copernicus Users to leverage the wealth of knowledge in scientific publications and Research Objects.

Enrichment

Try it out

The semantic enrichment process is in charge of generating new metadata out of the text content of files or collections of files, such as Research Objects. This metadata comprise the main concepts found in resources containing text, the main knowledge areas in which these concepts are most frequently used, the main expressions, known in computational linguistics as noun phrases, found in the text, and named entities that are further classified in people, organization and places. The core of the semantic enrichment process is expert.ai software. Expert.ai uses a proprietary semantic network, where words are grouped into concepts with other words sharing the same meaning, and the concepts are related between them by linguistic relations such as hypernyms or hyponyms among many others. Therefore, the semantics of the generated metadata is explicit since the concepts are grounded to the semantic network.

Information retrieval processes, including search engines and recommendendation systems, can benefit of working with concepts instead of character strings representing words, mainly to provide a more complete and accurate set of results, and enabling the exploration of file and research object collections by means of facets where the semantic metadata is available.

Document Enrichment

The files must be of any of the following types: Word documents, PDF documents, Text files, or PowerPoint Presentations. All these pieces of text are fed into expert.ai to generate the metadata representing their text content. Expert.ai is able to identify the following metadata types in the text:

     •   Domains
     •   Main Concepts
     •   Main Lemmas
     •   Main Expressions
     •   Named entities: all the named entities found in the text classified into People, Organizations and Places.

All these metadata types are added to the response as annotations in a json file. Below an example of how the service can be called and the results that it provides is presented.

API example

curl -X POST -F '[email protected]' https://reliance.expertcustomers.ai/eosc/enrichment




Enriched Research Objects

ROHub uses the Enrichment Service to enrich research objects. You can find some examples of ROs and the semantic annotations extracted by our service down below:

Search Index


The Search index used by the search service hosts the collection of research objects from the ROHub platform which have been previously enriched. These annotations, added to the original metadata of the research object, are leveraged to produce more accurate results and to provide new facets to explore the research object collection. This index also serves as core for the recommendation api, which returns recommended research objects from this collection. So, the goal of this api is to improve the exploration of the research object collection hosted by ROHub and to allow the users to make facet and semantic searching over them based on their text content.

The Search index follows a scheme which acommodates the metadata obtained from the ROHub platform and the annotations generated by the enrichment api. It has six different facets: Concepts (most frequent concepts mentioned in the text), Expressions (Most relevant phrases or collocations found in the text), Domains (fields of knowledge in which the main concepts are most commonly used), People, Places and Organizations. These facet fields, along with the rest of documents hosted by the index, are updated every time a research object is created or updated in ROHub. Moreover, each indexed document has attached other related information as the title, the description or the creator of the Research Object, which can be accessed through the Search API.

Search API

The index is built on a Solr version 8.9.0, and can be accessed using the SolrJ API or sending queries right to the service. To do it so, it is necessary to login with an standard user account, which allows to do search queries to the index. More information about how the Solr API works or which type of queries can be sent to this service can be found on the official Solr site. Click on some of their tutorials down below if you want to learn more:

Search for a single term

Field Searches

Phrase Search

Combining Searches

And for more advanced tutorials, click here:

Common Query Parameters

The Standard Query Parser

Query example

The result of the following query is a json document with the research objects which contains the word "inSAR":
curl --user standard_user:standard_user -H "Content-Type: application/json" https://reliance.expertcustomers.ai/solr/ROHub/select?q=inSAR



Facet Query example

The result of the following query is a json document with the research objects which have "Augustine Volcano" as one of their mentioned places:
curl --user standard_user:standard_user -H "Content-Type: application/json" "https://reliance.expertcustomers.ai/solr/ROHub/select?fq=place:Augustine%20Volcano&q=*:*"



Recommendation

Try it out

The recommendation system suggests research objects that might be of interest according to user’s research interests. The recommendation system follows a content-based approach in the sense that it compares the research object content with the user interest to draw the list of recommended items. This comparison is based on the annotations added by the semantic enrichment process. The user interests are identified from the top concepts in the user’s research objects. These concepts are then compared with the concepts that annotate the research objects in the whole collection. The user interest can be increased by i) adding specific research objects from other users or ii) adding a different scientist. In the former case the main concepts of the research object are added to the user’s interests and in the latter case the scientist interests are added to the user’s interests. The recommendation system has a rest API and a web user interface called Collaboration Spheres.

Recommendation API

The recommendation service rest api accepts post requests and returns a json document with the list of research objects that make up the recommendation. The service is currently deployed in: http://reliance.expertcustomers.ai/spheresbackend/services/jsonservices/api. To include research objects or scientist in the recommendation context the service accepts a json document of the form {“ros”:[“uri-1”,...], “scientists”:[“uri-2”,...]} where the element “ros” is an array containing the list of uris corresponding to the research objects that will be added to the recommendation context and the element scientist is an array containing the list of uris corresponding to the users that will be added to the recommendation context. To be consistent with definition of context in the collaboration spheres a maximum of three uris, either research objects, users or a combination of both, can be added to the recommendation context. Below an example of how the service can be call and the results that it provides is presented.

API example

curl -d '{"ros": ["https://w3id.org/ro-id/038179f2-f2dc-4cd6-a8ab-28765fb35950"], "scientists":[]}' -H "Content-Type: application/json" -X POST https://reliance.expertcustomers.ai/spheresbackend/services/jsonservices/api



More materials

EGI Notebook Tutorial

Learn how to invoke our APIs with the Jupyter Notebook we have released in EGI.

It is available under datahub/Reliance/Text_Mining_Tutorial/