Custom Analyzer for ElasticSearch


kdr b <kdrb...@...>
 

Hello Thomas, 

thank you for your informative response. I was not aware of the existence of Index templates in ES at all. I was thinking of working around the situation by simply changing the Indices after creation to suit my needs, but eventually your approach is the better solution. Nevertheless, i am still wondering if it makes sense to extend JanusGraph in such a way that one is able to specify all index settings when creating a mixed index. I am aware that such a solution must be universal for indexing backends. And honestly i dont have any clue here how reasonable and feasible this is. But if this makes sense i can discuss this with my colleagues and take a shot at it.

Regards

Am Montag, 23. April 2018 14:58:15 UTC+2 schrieb thomas prelle:

Hi,
This thread should be in janusgraph-users.

You problem is that your analyzer is not register. You can use custom_analyzer without register your analyzer if you use basic but known custom ES analyzer as english analyzer.

To achieve what you want to do you have two ways :
 - add a template before which add your analyzer and match your index https://www.elastic.co/guide/en/elasticsearch/reference/6.x/indices-templates.html
 - use elasticsearch.create.use-external-mappings at true, and put your mappings before register your index

Thomas
Le vendredi 20 avril 2018 07:58:47 UTC-4, kdr b a écrit :
Hello,

JanusGraph enables users to utilize ElasticSearch as an additional indexing backend. Instructions on how to accomplish this is available here http://docs.janusgraph.org/latest/index-parameters.html and following these instructions works fine.

The JanusGraph documentation indicates here http://docs.janusgraph.org/latest/field-mapping.html#_custom_analyser that it should be possible to use and specify custom analyzers in ElasticSearch (Paragraph 23.2.1.1). When i understand the documentation correctly then the following steps

customproperty = mgmt.makePropertyKey('customproperty').dataType(String.class).make()
mgmt.buildIndex('customproperty', Vertex.class).addKey(customproperty, Mapping.TEXT.asParameter(), Parameter.of(ParameterType.TEXT_ANALYZER.getName(), 'MY_CUSTOM_ANALYZER_NAME')).buildMixedIndex("search")

should ensure that an index is created with an analyzer named MY_CUSTOM_ANALYZER_NAME. And this is exactly what i am trying to achieve so far without success. I can verify that the index is indeed created, however when querying its settings with http://localhost:9200/janusgraph_customproperty/_settings i get

{
  "janusgraph_customproperty": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "janusgraph_customproperty",
        "max_result_window": "2147483647",
        "creation_date": "1524156242828",
        "number_of_replicas": "1",
        "uuid": "MsELAZBRT8ONf_Jjk0Yavw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

without any indication with respect to the custom analyzer, which however is visible when creating an index manually for instance with:

curl -XPUT 'localhost:9200/email_index?pretty' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email",
          "max_token_length": 5
        }
      }
    }
  }
}
'


{
  "email_index": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "email_index",
        "creation_date": "1524156863109",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "uax_url_email",
              "max_token_length": "5"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "xZS2_jGAR8qM5kUjozJRBw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

And exactly this manual index creation brings me to another point. In the curl request i am able to specify all setting items of the index (such as the tokenizer to use), am i correct to assume that i am not able to accomplish this through the management API at this point. And frankly for me the more urgent question is if the previously mentioned use case is supposed to work as i interpreted and described it.
Since me and my colleagues are using JanusGraph in combination with ElasticSearch in any case, we wouldnt mind to put some work into this.

Regards


tpr...@...
 

Hi,
This thread should be in janusgraph-users.

You problem is that your analyzer is not register. You can use custom_analyzer without register your analyzer if you use basic but known custom ES analyzer as english analyzer.

To achieve what you want to do you have two ways :
 - add a template before which add your analyzer and match your index https://www.elastic.co/guide/en/elasticsearch/reference/6.x/indices-templates.html
 - use elasticsearch.create.use-external-mappings at true, and put your mappings before register your index

Thomas

Le vendredi 20 avril 2018 07:58:47 UTC-4, kdr b a écrit :
Hello,

JanusGraph enables users to utilize ElasticSearch as an additional indexing backend. Instructions on how to accomplish this is available here http://docs.janusgraph.org/latest/index-parameters.html and following these instructions works fine.

The JanusGraph documentation indicates here http://docs.janusgraph.org/latest/field-mapping.html#_custom_analyser that it should be possible to use and specify custom analyzers in ElasticSearch (Paragraph 23.2.1.1). When i understand the documentation correctly then the following steps

customproperty = mgmt.makePropertyKey('customproperty').dataType(String.class).make()
mgmt.buildIndex('customproperty', Vertex.class).addKey(customproperty, Mapping.TEXT.asParameter(), Parameter.of(ParameterType.TEXT_ANALYZER.getName(), 'MY_CUSTOM_ANALYZER_NAME')).buildMixedIndex("search")

should ensure that an index is created with an analyzer named MY_CUSTOM_ANALYZER_NAME. And this is exactly what i am trying to achieve so far without success. I can verify that the index is indeed created, however when querying its settings with http://localhost:9200/janusgraph_customproperty/_settings i get

{
  "janusgraph_customproperty": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "janusgraph_customproperty",
        "max_result_window": "2147483647",
        "creation_date": "1524156242828",
        "number_of_replicas": "1",
        "uuid": "MsELAZBRT8ONf_Jjk0Yavw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

without any indication with respect to the custom analyzer, which however is visible when creating an index manually for instance with:

curl -XPUT 'localhost:9200/email_index?pretty' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email",
          "max_token_length": 5
        }
      }
    }
  }
}
'


{
  "email_index": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "email_index",
        "creation_date": "1524156863109",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "uax_url_email",
              "max_token_length": "5"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "xZS2_jGAR8qM5kUjozJRBw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

And exactly this manual index creation brings me to another point. In the curl request i am able to specify all setting items of the index (such as the tokenizer to use), am i correct to assume that i am not able to accomplish this through the management API at this point. And frankly for me the more urgent question is if the previously mentioned use case is supposed to work as i interpreted and described it.
Since me and my colleagues are using JanusGraph in combination with ElasticSearch in any case, we wouldnt mind to put some work into this.

Regards


kdrb...@...
 

Hello,

JanusGraph enables users to utilize ElasticSearch as an additional indexing backend. Instructions on how to accomplish this is available here http://docs.janusgraph.org/latest/index-parameters.html and following these instructions works fine.

The JanusGraph documentation indicates here http://docs.janusgraph.org/latest/field-mapping.html#_custom_analyser that it should be possible to use and specify custom analyzers in ElasticSearch (Paragraph 23.2.1.1). When i understand the documentation correctly then the following steps

customproperty = mgmt.makePropertyKey('customproperty').dataType(String.class).make()
mgmt.buildIndex('customproperty', Vertex.class).addKey(customproperty, Mapping.TEXT.asParameter(), Parameter.of(ParameterType.TEXT_ANALYZER.getName(), 'MY_CUSTOM_ANALYZER_NAME')).buildMixedIndex("search")

should ensure that an index is created with an analyzer named MY_CUSTOM_ANALYZER_NAME. And this is exactly what i am trying to achieve so far without success. I can verify that the index is indeed created, however when querying its settings with http://localhost:9200/janusgraph_customproperty/_settings i get

{
  "janusgraph_customproperty": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "janusgraph_customproperty",
        "max_result_window": "2147483647",
        "creation_date": "1524156242828",
        "number_of_replicas": "1",
        "uuid": "MsELAZBRT8ONf_Jjk0Yavw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

without any indication with respect to the custom analyzer, which however is visible when creating an index manually for instance with:

curl -XPUT 'localhost:9200/email_index?pretty' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email",
          "max_token_length": 5
        }
      }
    }
  }
}
'

and when querying it with http://localhost:9200/email_index/_settings

{
  "email_index": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "email_index",
        "creation_date": "1524156863109",
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "uax_url_email",
              "max_token_length": "5"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "xZS2_jGAR8qM5kUjozJRBw",
        "version": {
          "created": "5060399"
        }
      }
    }
  }
}

And exactly this manual index creation brings me to another point. In the curl request i am able to specify all setting items of the index (such as the tokenizer to use), am i correct to assume that i am not able to accomplish this through the management API at this point. And frankly for me the more urgent question is if the previously mentioned use case is supposed to work as i interpreted and described it.
Since me and my colleagues are using JanusGraph in combination with ElasticSearch in any case, we wouldnt mind to put some work into this.

Regards