A Digest of Evernote’s Architecture (*)

원문: https://blog.evernote.com/tech/2011/05/17/architectural-digest/

 

A Digest of Evernote’s Architecture (*)

 

 

 

Let’s get things started with a coarse-grained overview of the physical makeup of the Evernote service. I won’t go into a lot of detail on each component here; we’ll aim to talk about the interesting bits in separate posts later.

Starting at the top-left corner of the diagram, all stats as of May 17th, 2011 …

Networking: Virtually all traffic to and from Evernote comes to www.evernote.com via HTTPS port 443. This includes all “web” activity, but also all client synchronization via our Thrift -basedservice API .  Altogether, that produces up to 150 million HTTPS requests per day, with peak traffic around 250 Mbps. (Unfortunately for our semi-nocturnal Operations team, this daily peak tends to arrive around 6:30 am, Pacific time.)

We use BGP  to direct traffic through fully independent network feeds from our primary (NTT ) and secondary (Level 3 ) providers. This is filtered through Vyatta  on the way to the new A10 load balancers  that we deployed in January when we hit the limit of SSL performance for our old balancers. We’re comfortably handling existing traffic using one AX 2500  plus a failover box, but we’re preparing to test their N+1 configuration in our staging cluster to prepare for future growth.

Shards: The core of the Evernote service is a farm of servers that we call “shards.”  Each shard handles all data and all traffic (web and API) for a cohort of 100,000 registered Evernote users.  Since we have more than 9 million users, this translates into around 90 shards.

Physically, shards are deployed as a pair a SuperMicro  boxes with two quad-core Intel processors, a ton of RAM and full chassis of Seagate  enterprise drives in mirrored RAID configurations. On top of each box, we run a base Debian  host that manages two Xen  virtual machines. The primary VM on a box runs our core application stack:  Debian + Java 6  + Tomcat + Hibernate  + Ehcache  +  Stripes  + GWT  + MySQL  (for metadata) + hierarchical local file systems (for file data).

All user data on the primary VM on one box is kept synchronously replicated to a secondary VM on a different box using DRBD . This means that each byte of user data is on at least four different enterprise drives across two different physical servers, plus nightly backups. If we have any problems with a server, we can fail the services from its primary VM over to the secondary on another box with minimal downtime via Heartbeat .

Since each users’ data is completely localized to one (virtual) shard host, we can run each shard as an independent island with virtually no crosstalk or dependencies. This means that issues on one shard don’t snowball to other shards.

To connect users with their shard, we push most of the work into the load balancers, which have a cascade of rules to find the shard in the URL and/or cookies.

UserStore: While the vast majority of all data is stored within the single-tier NoteStore shards, they all share a single master “UserStore” account database (also MySQL) with a small amount of information about each account, such as: username, MD5 password, and user shard ID. This database is small enough to fit in RAM, but we maintain high redundancy with the same combination of RAID mirroring, DRBD replication to a secondary, and nightly backups.

AIR processors: In order to allow you to search for words found within images in your notes, we maintain a pool of 28 servers that spend each day using their 8 cores to process new images. On a busy day, this translates into 1.3 or 1.4 million separate images. Currently, these use a mix of Linux and Windows, but we plan to convert them all to Debian by the end of the month now that we’ve removed some pesky legacy dependencies.

These servers run a pipeline of “Advanced Imaging and Recognition” (AIR) software developed by our R&D team. This software cleans up each image, identifies word-shaped regions, and then attempts to compile a weighted list of possible matches for each word using a set of “recognition engines” that each contribute a set of guesses.  This includes engines developed by our own team which specialize in, for example, handwriting recognition, as well as licensed technologies from best-of-breed commercial partners.

Other services: All of these servers are racked in a pair of dedicated cages at our data center in Santa Clara, California. In addition to the hardware that provides our core service, we also have smaller groups of servers for lighter-weight tasks that only require one or two boxes or Xen virtual machines. For example, our “incoming email” SMTP gateway is a pair of Debian servers with Postfix  and a custom Java mail processor built on top of Dwarf . Our @myen Twitter gateway is a simple in-house daemon using twitter4j .

Our corporate web site is Apache , our blogs are WordPress , most of our fully redundant internal switching topology is from HP , we use Puppet  for configuration management, and we monitor with Zabbix , Opsview , and AlertSite . We run nightly backups with a combination of different software that migrates data over a dedicated 1Gbps link to a secondary data center.

Wait, but why? I realize this post leaves lots of obvious questions about why we’ve chosen to do X instead of Y in a number of different places. Why run our own servers instead of using a cloud provider? Why such stuffy old software (Java, SQL, local storage, etc.) instead of hot new magic bullets? …

We’ll try to get into more details to answer these questions in the next few months.

(*) UPDATE, June 29, 2011: The title of this post was changed from “Architectural Digest” at the request of Conde Nast.