Solr vs Elasticsearch

Solr vs Elasticsearch

Updated October 2020

Both Solr and Elasticsearch are popular open source search engines built on top of Lucene.  This article is intended to help readers learn more about the technologies in relation to one another to guide technology decisions.

Quick Reference Comparison of Elasticsearch vs Solr

As far as speed and performance go, Elasticsearch and Solr are often comparable, but that can change depending on your specific use case.   Both technologies also have a large community of contributors and several options for expert support from companies like Dattell.  

Solr is open source software with a community of committers that are assigned based on merit alone.  Elasticsearch is also open source.  However, it does have premium options that fall under a proprietary license, and committers are limited to Elastic employees. 

If you have experience with Solr, need advanced options for full-text searches, or are an open source purist,  then Solr might be the right choice for you.  Otherwise, there are several reasons we suggest Elasticsearch in most cases. Elasticsearch is currently the most popular database of its kind, and it has up-to-date and thorough documentation.  Elasticsearch’s user friendly API adds to its popularity with developers, and its horizontal scaling capabilities make it a solution that can grow with your company.

Popularity

As of October 2020, according to DB-Engines, Elasticsearch is the most popular search engine database, and Solr is third.  While Solr was initially more popular than Elasticsearch, Elasticsearch surpassed Solr in late 2015 and has continued to increase adoption. DB-Engines rankings consider how many times a technology is being mentioned on websites, search engines, IT-related Q&A sites, and social networks.  The rankings also factor in the number of job offers and professional network profiles that include the technology.

Use Cases

Solr is intended for enterprise-directed text searches requiring information retrieval and/or analytics.  It is popular for its full-text search and rich document handling (e.g. PDF and Word docs) using the Apache Tika library.  Full-text searches are distinguished from metadata or partial text searches (e.g. titles, abstracts, etc.).  Solr has a greater scope of features for full-text search at the moment.  However, Elasticsearch is actively improving its full-text search features.

Elasticsearch is tailored for processing time series data, analytics, and scaling.  Like Solr, Elasticsearch can also perform full-text searches, and it can read rich documents, like PDF and Word docs, using Apache Tika.  Elasticsearch interacts with data in JSON format making it an easy choice for interacting with web applications.  In addition to these specific use cases, Elasticsearch is suitable as primary data storage.

Documentation

Solr has decent documentation but not as good as it used to be.  Committers to Solr are determined on merit alone, not company affiliation.

Elasticsearch has thorough and up-to-date documentation including clear examples available on the Elastic website. It has a large community of contributors but committed changes are only carried out by Elastic employees.

Install & Configure

Java is required for both Elasticsearch and Solr.  As a default, Elasticsearch requires more heap memory than Solr (1 GB vs 512 MB), but these defaults can be changed.  Elastic is also roughly 100 MB larger than Solr in its compressed form.

Solr has a history of being difficult to approach, but has gotten better recently.  With Solr, users also need to install and optimize ZooKeeper. More on that below.  Additionally, Solr requires a managed-schema file.

Elasticsearch is more approachable for new users, with a single download.  Also, Elasticsearch is schemaless.  This means that index schema are not required to start ingesting and indexing documents.  In most cases you do want to manually set up indices, but even without the schema Elasticsearch does quite well.  Continue reading for more on indexing:  How to Index Elasticsearch

Scalability

Both Solr and Elasticsearch support sharding, which allows for information to be distributed over multiple servers.

Solr allows users to split existing shards, but rebalances are difficult to manage.  Solr requires ZooKeeper for cluster coordination.

Elasticsearch is designed for horizontal scaling, making it inherently easier to scale.  Shards can also be split in an existing index by using either the Split Index API or Reindex API.  

Elasticsearch allows for more automation with cluster rebalancing, often completely hands-off.  Elasticsearch handles cluster coordination internally (no ZooKeeper).  Continue reading about Elasticsearch shard optimization.

Data Sources

Solr supports over a thousand file types using the Apache Tika library, and it has request handlers for a number of popular file types such as CSV, Word docs, PDF, and XML.

Elasticsearch uses JSON to ingest data from multiple sources and uses both lightweight data shippers (Filebeat) and Logstash as part of the data pipeline.  As with Elasticsearch, both Beats and Logstash are part of the Elastic Stack.  Like Solr, Elasticsearch can also use the Apache Tika library to support rich documents.

Query DSL

Solr has improved its query capabilities in recent versions.  With the latest versions of Solr (starting with 7), there is more capability for structured queries using JSON.  However, in older versions of Solr the URI search led to complicated queries.

Elasticsearch is designed for structured querings using JSON.  This format allows for sophisticated queries by combining multiple kinds of queries together.  Elasticsearch also has a versatile aggregation engine for nested data analysis.  For instance, with Elasticsearch you can calculate the average sale total for each customer category.

Summarizing Solr vs Elasticsearch

Both Solr and Elasticsearch are popular search engine databases with large, involved communities and similar capabilities.  Elasticsearch is more approachable for new users, easier to scale, and has better querying and analytics capabilities than Solr.  Both databases can do full-text searches and read rich documents using the Apache Tika library.  

Elasticsearch Support with Elastic Certified Engineers

Dattell’s Elastic Certified Engineers work one-on-one with companies to design, implement, manage, and improve their Elasticsearch deployments.  Pricing for Elasticsearch support services starts at $2,400.

Leave a Reply