Updated December 2022
Both Apache Solr and Elasticsearch are popular open source* search engines built on top of Lucene. This article is intended to help readers learn more about the technologies in relation to one another to guide technology decisions.
Quick Reference Comparison of Elasticsearch vs Solr
As far as speed and performance go, Elasticsearch and Solr are often comparable, but that can change depending on your specific use case. Both technologies also have a large community of contributors and several options for expert support from companies like Dattell.
Solr is open source software with a community of committers that are assigned based on merit alone. Elasticsearch was formerly open source under the Apache 2.0 License. However, starting in 2021 with the release of version 7.11, Elasticsearch is now free under the Server Side Public License. For more information on the licensing limitations, check out our Elasticsearch basics post. Additionally, committers to Elasticsearch are limited to Elastic employees.
If you have experience with Solr, need advanced options for full-text searches, want to offer your product as a managed service, or are an open source purist, then Solr might be the right choice for you. Otherwise, there are several reasons we suggest Elasticsearch in most cases. Elasticsearch is currently the most popular database of its kind, and it has up-to-date and thorough documentation. Elasticsearch’s user friendly API adds to its popularity with developers, and its horizontal scaling capabilities make it a solution that can grow with your company.
Popularity
As of December 2022, according to DB-Engines, Elasticsearch is the most popular search engine database, and Solr is third. While Solr was initially more popular than Elasticsearch, Elasticsearch surpassed Solr in late 2015 and has mostly continued to increase adoption.
DB-Engines rankings consider how many times a technology is being mentioned on websites, search engines, IT-related Q&A sites, and social networks. The rankings also factor in the number of job offers and professional network profiles that include the technology.
Another interesting finding in this graph that OpenSearch, a fork from Elasticsearch, is number four. OpenSearch is a fully open source fork from Elasticsearch v7.10. OpenSearch is licensed under Apache License v2.0. Learn more about What is OpenSearch here.
Or check out this post about OpenSearch vs. Elasticsearch to learn more about how they compare.
Use Cases
Solr is intended for enterprise-directed text searches requiring information retrieval and/or analytics. It is popular for its full-text search and rich document handling (e.g. PDF and Word docs) using the Apache Tika library. Full-text searches are distinguished from metadata or partial text searches (e.g. titles, abstracts, etc.). Solr has a greater scope of features for full-text search at the moment. However, Elasticsearch is actively improving its full-text search features.
Elasticsearch is tailored for processing time series data, analytics, and scaling. Like Solr, Elasticsearch can also perform full-text searches, and it can read rich documents, like PDF and Word docs, using Apache Tika. Elasticsearch interacts with data in JSON format making it an easy choice for interacting with web applications. In addition to these specific use cases, Elasticsearch is suitable as primary data storage.
Documentation
Solr has decent documentation but not as good as it used to be. Committers to Solr are determined on merit alone, not company affiliation.
Elasticsearch has thorough and up-to-date documentation including clear examples available on the Elastic website. It has a large community of contributors but committed changes are only carried out by Elastic employees.
Install & Configure
Java is required for both Elasticsearch and Solr. As a default, Elasticsearch requires more heap memory than Solr (1 GB vs 512 MB), but these defaults can be changed. Elasticsearch is also roughly 100 MB larger than Solr in its compressed form.
Solr has a history of being difficult to approach, but has gotten better recently. With Solr, users also need to install and optimize ZooKeeper. More on that below. Additionally, Solr requires a managed-schema file.
Elasticsearch is more approachable for new users, with a single download. Also, Elasticsearch is schemaless. This means that index schema are not required to start ingesting and indexing documents. In most cases you do want to manually set up indices, but even without the schema Elasticsearch does quite well. Continue reading for more on indexing: How to Index Elasticsearch.
Scalability
Both Solr and Elasticsearch support sharding, which allows for information to be distributed over multiple servers.
Solr allows users to split existing shards, but rebalances are difficult to manage. Solr requires ZooKeeper for cluster coordination.
Elasticsearch is designed for horizontal scaling, making it inherently easier to scale. Shards can also be split in an existing index by using either the Split Index API or Reindex API.
Elasticsearch allows for more automation with cluster rebalancing, often completely hands-off. Elasticsearch handles cluster coordination internally (no ZooKeeper). Continue reading about Elasticsearch shard optimization.
Data Sources
Solr supports over a thousand file types using the Apache Tika library, and it has request handlers for a number of popular file types such as CSV, Word docs, PDF, and XML.
Elasticsearch uses JSON to ingest data from multiple sources and uses both lightweight data shippers (Filebeat) and Logstash as part of the data pipeline. As with Elasticsearch, both Beats and Logstash are part of the Elastic Stack. Like Solr, Elasticsearch can also use the Apache Tika library to support rich documents.
Query DSL
Solr has improved its query capabilities in recent versions. With the latest versions of Solr (starting with 7), there is more capability for structured queries using JSON. However, in older versions of Solr the URI search led to complicated queries.
Elasticsearch is designed for structured queries using JSON. This format allows for sophisticated queries by combining multiple kinds of queries together. Elasticsearch also has a versatile aggregation engine for nested data analysis. For instance, with Elasticsearch you can calculate the average sale total for each customer category.
Summarizing Solr vs Elasticsearch
Both Solr and Elasticsearch are popular search engine databases with large, involved communities and similar capabilities. Elasticsearch is more approachable for new users, easier to scale, and has better querying and analytics capabilities than Solr. Both databases can do full-text searches and read rich documents using the Apache Tika library.
Elastic Stack Consulting Services
If you are interested in 24/7 support, consulting, and/or fully managed Elasticsearch services on your environment, you can find more information on our Elasticsearch consulting page.
Schedule a call with an Elastic Stack engineer.
Published by