Showing all posts tagged ec2:

I've seen few guides posted on how to do this in EC2. All of them are pretty good, but they tend to over complicate things or don't outline a few of the great caveats that can make this configuration a nightmare to troubleshoot and get working properly. Here's my take on it.

I'm going to assume you've deployed to EC2 before. We won't be using advanced features like VPC for this. Although VPC is a great option for deploying Mongo, It adds an extra layer of complexity I don't want to get into. Maybe later?

First, some up front information.

1) If you're deploying MongoDB as a replica set, you NEED an odd number of nodes. The reason for this is so the remaining nodes can vote to promote a node to master. An even number of nodes means there can be a tie, which means your ReplicaSet will enter a failed state with no Primary.

2) If you're deploying with the intent to be resilient to failures, you must put your nodes in different Availability Zones. If an entire AZ goes down, your ReplicaSet can still survive.

3) You can configure Mongo on the fly via the command line. Again, if the goal is resiliency, we want Mongo to start with our chosen settings every time. This means editing some configuration files in addition to doing things at the command line.

4) There are 3 types of Mongo nodes I'm going to talk about. Primary (also sometimes called master), Secondary (also sometimes called slave), and Arbiter (sometimes referred to as dataless). Primary is your read/write node.

Secondary is a read only node that can become read/write if a failover occurs. An Arbiter exists with the sole purpose to break tied votes. You can have two secondaries instead of a Secondary and an Arbiter, but most initial deployments utilize an Arbiter as two Secondaries aren't needed for that small of a scale. We're also going to use an Arbiter in the example since adding another secondary would just be repeating the steps we did for the first secondary.

With me so far? Good. Now to the meat of it:

Part 1:

EC2 Setup

1) Launch your EC2 instances. Remember, we need an odd number. If you're just trying Mongo ReplicaSets out, I recommend a Primary, Secondary, and Arbiter setup. You can use a non-dedicated system as the Arbiter if you want. For the sake of consistency, I'm going to assume 3 new nodes. Spin up three 64-bit versions of your favorite flavor of Linux. m1.small instance size will suffice for testing purposes, but Mongo likes to keep all data in memory. If your database is going to be large, adjust instance size accordingly. Lets name them mongo1, mongo2, and mongo3. For the examples here are the IP addresses of the servers.

mongo1  10.40.202.101

mongo2  10.40.202.102

mongo3  10.40.202.103

2) Create EBS volumes for your Mongo data. You'll want to mount it to /data/db by default. Any filesystem type will do. We'll use ext4 for the example. If you're going to run an Arbiter node, it doesn't need an EBS volume. It's not going to write any data. I'm assuming that your new EBS volume is going to be /dev/sdf for the code example below.

mkdir -p /data/db

mkfs.ext4 /dev/sdf

mount /dev/sdf /data/db

add the following to your /etc/fstab

/dev/sdf /data/db auto noatime,noexec,nodiratime 0 0

df should show /dev/sdf mounted to /data/db

Do this on two nodes. mongo1 and mongo2. The arbiter does not need persistent storage so you don't need to add EBS volumes to mongo3.

optional

3) When an EC2 instance is created, it's typically given a host name like "ip-10-40-202-168." If you want to use hosts names in the mongo config instead of IP addresses you should set the host name on each server to something more friendly and add the hostname and IP addresses to the /etc/hosts file on each server.

hostname mongo1

add this to each /etc/hosts file

mongo1  10.40.202.101

mongo2  10.40.202.102

mongo3  10.40.202.103

Part 2:

Mongo Installation and Configuration

1) WIth the prep out of the way, you should have 3 EC2 instances, two of them backed with an EBS data volume and one with all the defaults. Now we install the actual MongoDB binaries. There should be repositories for your flavor of linux that you can add to your systems. If you use the 10gen repo (and I recommend that you do) you'll have to specify the exact version of Mongo you

want to install. It's best to keep this consistent across all your nodes. I've not run into any compatibility problems with differing versions of Mongo, but it's best to keep that in sync. Available installation methods are here, including how to add the 10gen repo to your system and how to install and pin specific versions of mongodb.

2) With Mongo installed we're almost there. We need to add some parameters to our mongo configuration before starting up mongodb, Edit /etc/mongodb.conf. It's heavily commented to guide you through you customizations. We care about

replSet = setname. This defines the replica set name and needs to be the same between nodes. I recommend keeping the

other defaults for this exercise and only adding the replica set name the the end of the default configuration.

...

#in replica set configuration, specify the name of the replica set

#replSet = setname

replSet = MyReplSet

3) Next we need to initialize the replica set. First start the MongoDB server:

sudo service mongodb start

Once started, login to the mongo shell, just type mongo at the command line.

jhertz@mongo1:~$ mongo

MongoDB shell version: 2.4.3

connecting to: test

>

Once in the mongo shell initializing the database is one command:

rs.initiate()

You should get "ok" as a response. rs.status() should show you that you have a single node in your replica set. It will have whatever is defined as the hostname of the local machine. If you decided to set the hostnames of each server and added their IPs to /etc/hosts, you shouldn't need to change the configuration. Run rs.status() to see how it setup the initial node.

MyReplSet:PRIMARY> rs.status()

{

        "set" : "MyReplSet",

        "date" : ISODate("2013-05-20T17:46:55Z"),

        "myState" : 1,

        "members" : [

                {

                        "_id" : 0,

                        "name" : "mongo1:27017",

                        "health" : 1,

                        "state" : 1,

                        "stateStr" : "PRIMARY",

                        "uptime" : 1038371,

                        "optime" : {

                                "t" : 1368133781,

                                "i" : 2

                        },

                        "optimeDate" : ISODate("2013-05-09T21:09:41Z"),

                        "self" : true

                }

        ],
        "ok" : 1

}

If you did not set the host name and IPs you will need to update the running configuration to look for hosts by IP address.

cfg = rs.conf()

cfg.members[0].host ="10.40.202.101:27017"

rs.reconfig(cfg)

What this does is reads the running configuration into cfg. Then manipulates the host value in cfg to the ipaddress and port, then reconfigures the running configuration with the values in cfg.

4) Let's add our secondary node. Duplicate the configuration changes you made to the primary's /etc/mongodb.conf and start mongo as a service. Once started go back to you Primary node in the mongo shell and type:

rs.add("mongo2:27017")

or

rs.add("10.40.202.102:27017")

You should see another "ok." check rs.status() and you should see your primary and the secondary. The secondary may still be syncing from the primary but if you tail /var/log/mongodb/mongodb.log, you should see connections being made between the nodes. If you don't, that means that can't see each other or talk to each other. Check the logs first to see if it tells you what the problem is. Typically it'll say something like "can't find host ip-10-40-202-168" which means the Secondary node can't resolve the primary. update your /etc/hosts or change the reference to the IP address.

5) With the Primary replicating data to the Secondary we're nearly there. Last piece is the arbiter node. Do like you did for the secondary, update the mongo config and start mongo. On the primary in mongo shell type:

rs.addArb("mongo3:27017")

or

rs.addArb("10.40.202.103:27017")

This will add the arbiter node to the set.

rs.status() will now show a Primary, Secondary, and Arbiter.

MyReplSet:PRIMARY> rs.status()

{

        "set" : "MyReplSet",

        "date" : ISODate("2013-05-20T17:46:55Z"),

        "myState" : 1,

        "members" : [

                {

                        "_id" : 0,

                        "name" : "mongo1:27017",

                        "health" : 1,

                        "state" : 1,

                        "stateStr" : "PRIMARY",

                        "uptime" : 1038371,

                        "optime" : {

                                "t" : 1368133781,

                                "i" : 2

                        },

                        "optimeDate" : ISODate("2013-05-09T21:09:41Z"),

                        "self" : true

                },

                {

                        "_id" : 1,

                        "name" : "mongo2:27017",

                        "health" : 1,

                        "state" : 2,

                        "stateStr" : "SECONDARY",

                        "uptime" : 1037912,

                        "optime" : {

                                "t" : 1368133781,

                                "i" : 2

                        },

                        "optimeDate" : ISODate("2013-05-09T21:09:41Z"),

                        "lastHeartbeat" : ISODate("2013-05-20T17:46:55Z"),

                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),

                        "pingMs" : 1,

                        "syncingTo" : "mongo1:27017"

                },

                {

                        "_id" : 2,

                        "name" : "mongo3:27017",

                        "health" : 1,

                        "state" : 7,

                        "stateStr" : "ARBITER",

                        "uptime" : 1037894,

                        "lastHeartbeat" : ISODate("2013-05-20T17:46:53Z"),

                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),

                        "pingMs" : 14

                }

        ],

        "ok" : 1

}

What have we done?

We have a Primary Mongo node that's read/write, we have a secondary Replica that's read only, and we have an Arbiter, which houses no data but votes for node promotion. Sweet deal, what can we do with it? Test fail over. You can stop mongo on the primary, then hop onto the mongo shell and issue an rs.status() to see that the secondary has become the primary and the arbiter is still in there. If the old primary comes back, it'll show up as secondary until something else happens, like the new primary failing before it takes over. The arbiter will just sit there and make sure that there's a majority to elect a new primary if the primary goes down.

You can scale your reads out across more secondary nodes in the replica set as needed. The arbiter is only needed if you end up with an even number of nodes. So if we need to scale out one more server, we'd stop the arbiter and add another secondary making 3 mongo nodes with data on them. If we add another, we're going to want to bring the arbiter back. The thing about the arbiter is that it's very light weight. You can add it to an existing web server to save on costs if you need to. There's many more advanced configurations you can get into depending on your needs for mongo, I hope this serves as a decent primer to get you up and running in a test environment.