Quickstart

ElasticHash implements efficient similarity search by using a two-stage method for efficiently searching binary hash codes using Elasticsearch. In the first stage, a coarse search based on short hash codes is performed using multi-index hashing and ES terms lookup of neighboring hash codes. In the second stage, the list of results is re-ranked by computing the Hamming distance on long hash codes.

For a whole image similarity search system, including model training and model serving, see https://github.com/umr-ds/ElasticHash.

Important

Currently only 256 bit codes are supported

Install python package

pip install elastichash

Create an Elastisearch client to use it with ElasticHash.

es = Elasticsearch(elasticsearch_endpoint)
eh = ElasticHash(es)

New items can be added by calling add() where code can be list, str or np.ndarray together with additional fields

eh.add(code, additional_fields={"image_path": "/path/to/an/image"})

After adding a sufficiently large amount of codes (e.g. 10,000), decorrelate() needs to be called to rearrange the binary hashcode permutations

Search documents by their hash code use search(). By string:

search('0010100101010010010100100100101000101001010100100101001001001010010101001001010010010100100101001010100100101001001010010001001001010010101001001010010010100100010101001001010010010100100101001010100100101001001010010001001001010010101001001010010010100100')

Or by list of string or integer:

search(['0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','0','1','0','1','0','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','0','1','0','0','1','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','0','1','0','0','1','0','0','1','0','1','0','0','1','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0','1','0','1','0','0','1','0','0'])
search([0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0])

Or use a list or numpy.ndarray of four (long) int values as query:

search([1,-1,1000,-1000])
search(["1","-1","1000","-1000"])
search(numpy.array([1, -1, 1000, -1000]))