From MySQL full-text search to Elasticsearch
Migrating backend search technologies on a high-throughput production site is no easy task, but Vector Media Group was recently faced with this decision. With a popular client site struggling under the load of complex MySQL full-text search queries, they recently switched to Elasticsearch.
I spoke with Matt Weinberg to learn how the migration went. Was the switch to Elaticsearch worth the effort?
How did you handle search before Elasticsearch?
We created a custom search using MySQL queries and implemented it into our CMS for the project, ExpressionEngine.
What were the problems with this approach?
To support full-text search, we needed to use the MySQL MyISAM storage engine. This has major downsides, the primary one being full table locks: when a table is updated, no other changes to that table can be performed.
Our tables have considerable update activity, so this would result in sometimes-significant performance issues.
What about running the queries on a slave DB?
We ended up doing this. It was a fairly simple step and allowed us to switch to the InnoDB engine on the master, eliminating the table lock issues.
This bought us some time, but it wasn't a long-term solution: we basically were rolling our own search and this frequently involved complex queries that third-party search libraries could perform more efficiently. We ended up with massive queries composed of many JOINs plus AND/ORs - these aren't easy to maintain.
Besides query complexity, it's tough to beat the performance of a dedicated search solution.
What solutions did you consider besides Elasticsearch?
We also considered Solr and Amazon CloudSearch. However, we started with Elasticsearch as we heard really great things about it. Elasticsearch met all of our needs in our early experiments so we decided to continue with it.
How did you handle the actual switch from MySQL to Elasticsearch?
We started with a list of common search terms and result expectations and ensured the results looked solid to our team and to our client. Elasticsearch shined: in many cases, the results returned from Elasticsearch were more relevant than those from MySQL.
We rolled over to Elasticsearch in three stages:
- Launched without any UI changes: we just refactored the search UI to use the Elaticsearch results. Only a small percentage of our search traffic got routed to ES; the rest kept using the existing MySQL search.
- We monitored performance and results, and as those looked good, gradually ramped up the amount of traffic using Elasticsearch until it was 100%. We used Scout to continually monitor Elasticsearch, MySQL, and front-end server performance during this time.
- Once we were totally moved over to Elasticsearch and ensured our infrastructure was holding up, we launched new features that took advantage of what Elasticsearch had to offer.
What was the implementation time for Elasticsearch? perhaps a proof-of-concept timeline and a full rollout timeline?
About 2-3 months, including investigation time, tool building, and implementation.
Besides faster search, did Elasticsearch give you easy wins on new features?
Definitely. Facets/Aggregations are much much faster now than the MySQL way we were doing it before.
How does your Elasticsearch hardware compare to your MySQL setup?
Our MySQL master and slave are 8 core, 52GB memory machines (each). We have a single Elasticsearch box that is a 4 core, 15GB memory server and search performance is much better than standard MySQL queries. While we've continued to upgrade our MySQL hardware as the app's usage has grown, we haven't needed to bump up our Elasticsearch resources.
When you need to scale ES, have you thought about whether you'll add more nodes or scale vertically?
We'll almost definitely scale vertically for as long as we can before adding new nodes. While Elasticsearch makes adding new nodes easy, we prefer the ease of administration, monitoring, and deployment (along with a lower surface area for issues) that having fewer, larger nodes affords.
What's your favorite thing about Elasticsearch?
The search query syntax/API is really easy to work with and the client library support is great. Much better than massive MySQL SELECT queries.
Monitoring Elasticsearch with Scout
Scout provides three plugins for monitoring Elasticsearch:
Elasticsearch Cluster Status
Elasticsearch Cluster Status tracks key cluster health metrics and alerts when the status changes.
Elasticsearch Index Status
Elasticsearch Index Status reports key metrics on an Elasticsearch index.
Elasticsearch Node Status
Elasticserch Node Status reports stats on a specific node in an Elasticsearch cluster.
More about Vector Media Group
Vector Media Group is a 24-person interactive agency based in Manhattan. They specialize in web and mobile development, design, and online marketing. Their clients range from large Fortune 100 companies to small startups and everything in between. They are well-known for a variety of work, including their significant ExpressionEngine experience.