Custom Analyzer for ElasticSearch


kdrb...@...
 

Hello,

JanusGraph enables users to utilize ElasticSearch as an additional indexing backend. Instructions on how to accomplish this is available here http://docs.janusgraph.org/latest/index-parameters.html and following these instructions works fine.

The JanusGraph documentation indicates here http://docs.janusgraph.org/latest/field-mapping.html#_custom_analyser that it should be possible to use and specify custom analyzers in ElasticSearch (Paragraph 23.2.1.1). When i understand the documentation correctly then the following steps

customproperty = mgmt.makePropertyKey('customproperty').dataType(String.class).make()
mgmt.buildIndex('customproperty', Vertex.class).addKey(customproperty, Mapping.TEXT.asParameter(), Parameter.of(ParameterType.TEXT_ANALYZER.getName(), 'MY_CUSTOM_ANALYZER_NAME')).buildMixedIndex("search")

should ensure that an index is created with an analyzer named MY_CUSTOM_ANALYZER_NAME. And this is exactly what i am trying to achieve so far without success. I can verify that the index is indeed created, however when querying its settings with http://localhost:9200/janusgraph_customproperty/_settings i get

{
  "janusgraph_customproperty": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "janusgraph_customproperty",
        "max_result_window": "2147483647",
        "creation_date": "1524156242828",
        "number_of_replicas": "1",
        "uuid": "MsELAZBRT8ONf_Jjk0Yavw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

without any indication with respect to the custom analyzer, which however is visible when creating an index manually for instance with:

curl -XPUT 'localhost:9200/email_index?pretty' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email",
          "max_token_length": 5
        }
      }
    }
  }
}
'

and when querying it with http://localhost:9200/email_index/_settings

{
  "email_index": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "email_index",
        "creation_date": "1524156863109",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "uax_url_email",
              "max_token_length": "5"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "xZS2_jGAR8qM5kUjozJRBw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

And exactly this manual index creation brings me to another point. In the curl request i am able to specify all setting items of the index (such as the tokenizer to use), am i correct to assume that i am not able to accomplish this through the management API at this point. And frankly for me the more urgent question is if the previously mentioned use case is supposed to work as i interpreted and described it.
Since me and my colleagues are using JanusGraph in combination with ElasticSearch in any case, we wouldnt mind to put some work into this.

Regards

Join janusgraph-dev@lists.lfaidata.foundation to automatically receive all group messages.