Self-note: Resume / recover a MongoDB change stream

Introduction

  • Sometimes replicator needs to be restarted
  • We cannot afford to lose one or two entries in time-series since it would throw the statistics off, and in exceptional cases, lost the max and min value

Resume & recover 

Change streams are resumable by specifying a resumeAfter token when opening the cursor. For the resumeAfter token, use the _id value of the change stream event document. Passing the _id value to the change stream attempts to resume notifications starting after the specified operation.

IMPORTANT

In the example below, resumeToken contains the change stream notification id. The resumeAfter takes a parameter that must resolve to a resume token. Passing the resumeToken to the resumeAfter modifier directs the change stream to attempt to resume notifications starting after the operation specified.

copy
copied

If the featureCompatibilityVersion (fcv) is set to "4.0" or greater, newly opened change streams return a hex-encoded string for the resume token data, i.e. the _id._data value. This change allows for the ability to compare and sort the resume tokens. If the fcv is 3.6, newly opened change streams return a BinData for the resume token data.

IMPORTANT

The fcv value at the time of the cursor’s opening determine the resume token data type. That is, the modification of the fcv does not affect the resume tokens for change streams already opened before the fcv change.

Regardless of the fcv value, a 4.0 replica set or a sharded cluster can resume a change stream using either the BinData or string resume token.

As such, a 4.0 deployment can use a resume token from a change stream opened on a collection from a 3.6 deployment.

Implementation

Structure of a Mongo change event

{
   _id : { <BSON Object> },
   "operationType" : "<operation>",
   "fullDocument" : { <document> },
   "ns" : {
      "db" : "<database>",
      "coll" : "<collection"
   },
   "documentKey" : { "_id" : <ObjectId> },
   "updateDescription" : {
      "updatedFields" : { <document> },
      "removedFields" : [ "<field>", ... ]
   }
   "clusterTime" : <Timestamp>,
   "txnNumber" : <NumberLong>,
   "lsid" : {
      "id" : <UUID>,
      "uid" : <BinData>
   }
}

Pay attention to _id

Metadata related to the operation.

Use this document as a resumeToken for the resumeAfter parameter when resuming a change stream.

If the featureCompatibilityVersion (fcv) is set to "4.0" or greater, newly opened change streams return a hex-encoded string for the resume token data, i.e. the _id._data value. This change allows for the ability to compare and sort the resume tokens. If the fcv is 3.6, newly opened change streams return a BinData for the resume token data.

IMPORTANT

The fcv value at the time of the cursor’s opening determine the resume token data type. That is, the modification of the fcv does not affect the resume tokens for change streams already opened before the fcv change.

Regardless of the fcv value, a 4.0 replica set or a sharded cluster can resume a change stream using either the BinData or string resume token.

This field is BSON so it

  • Can’t be saved with JSON.stringify
  • Can’t be cast to ObjectID (wrong format, different than MongoDB documentation )

Solution: Include bson

const BSON = require(‘bson’);

function saveId(cb) {
if (lastId) {
let lastIdBuffer = bson.serialize(lastId);
fs.writeFile(ID_FILE, lastIdBuffer, (err) => {
if (err) {
logger.error(‘[saveId]’, err);
}
return cb && cb(err);
});
}
}



function loadId(cb) {
fs.readFile(ID_FILE, (err, data) => {
let buffer;
if (!err && data) {
buffer = bson.deserialize(data);
}
return cb && cb(err, buffer);
});
}

  • Every one second, write the latest _id to disk
  • Reload the _id object from disk on startup
  • Use it to resume the change stream with

const pipeline = [
{
$match: { ‘ns.db’: config.MONGO_DB_NAME },
}
];

let changeStreamOptions = {};

changeStreamOptions[‘resumeAfter’] = resumeToken;

const changeStream = db.watch(pipeline, changeStreamOptions);
changeStream.on(‘change’, (change) => {

}

Reference

db.watch(pipeline, options)

New in version 4.0: Requires featureCompatibilityVersion (fCV) set to "4.0" or greater. For more information on fCV, see setFeatureCompatibilityVersion.

Opens a change stream cursor for a database to report on all its non-system collections.

A sequence of one or more of the following aggregation stages:

See Aggregation for complete documentation on the aggregation framework.

Optional. Additional options that modify the behavior of db.watch().

You must pass an empty array [] to the pipeline parameter if you are not specifying a pipeline but are passing the options document.

The options document can contain the following fields and values:

Optional. Directs db.watch() to attempt resuming notifications starting after the operation specified in the resume token.

Each change stream event document includes a resume token as the _idfield. Pass the entire _id field of the change event document that represents the operation you want to resume after.

resumeAfter is mutually exclusive with startAtOperationTime.

Optional. By default, db.watch() returns the delta of those fields modified by an update operation, instead of the entire updated document.

Set fullDocument to "updateLookup" to direct db.watch() to look up the most current majority-committed version of the updated document. db.watch() returns a fullDocument field with the document lookup in addition to the updateDescription delta.

Optional. Specifies the maximum number of change events to return in each batch of the response from the MongoDB cluster.

Has the same functionality as cursor.batchSize().

Optional. The maximum amount of time in milliseconds the server waits for new data changes to report to the change stream cursor before returning an empty batch.

Defaults to 1000 milliseconds.

Optional. The starting point for the change stream. If the specified starting point is in the past, it must be in the time range of the oplog. To check the time range of the oplog, see rs.printReplicationInfo().

startAtOperationTime is mutually exclusive with resumeAfter.

SEE ALSO

db.collection.watch() and Mongo.watch()

To generate a new ObjectId, use ObjectId() with no argument:

copy
copied

In this example, the value of x would be:

copy
copied

To generate a new ObjectId using ObjectId() with a unique hexadecimal string:

copy
copied

In this example, the value of y would be:

copy
copied

Access the str attribute of an ObjectId() object, as follows:

copy
copied

This operation will return the following hexadecimal string:

copy
copied

Create a new ObjectID instance

class ObjectID()Arguments:id (string) – Can be a 24 byte hex string, 12 byte binary string or a Number.Returns:object instance of ObjectID.

Return the ObjectID id as a 24 byte hex string representation

toHexString()Returns:string return the 24 byte hex string representation.

Examples

Generate a 24 character hex string representation of the ObjectID

Random notes trying to install s3fs and backblaze support on Linux

B2 support for duplicity

Must be installed with pip for python 2

apt-get install python-pip
pip install b2

apt-get install duplicity
add-apt-repository ppa:duplicity-team/ppa
apt-get update
apt-get --only-upgrade install duplicity

duplicity /mount/point b2://<box's application ID>:<key>@<box id>

Worth noting that the documentation on backblaze themselves is wrong, you must use your bucket’s app id, not your master app id

Installing s3fs on ubuntu

libfuse must be installed from source or else you’ll get “modprobe: ERROR: ../libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file ‘/lib/modules/2.6.32-042stab131.1/modules.builtin.bin'”

wget https://github.com/libfuse/libfuse/archive/fuse-2.9.8.tar.gz
tar xvf fuse-2.9.8.tar.gz
cd libfuse-fuse-2.9.8
./configure

Opps… error! 

zsh: no such file or directory: ./configure

I noticed there’s configure.ac file, so it should be

autoconf

Opps… error again

configure.ac:6: error: possibly undefined macro: AM_INIT_AUTOMAKE
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
configure.ac:10: error: possibly undefined macro: AC_PROG_LIBTOOL
configure.ac:13: error: possibly undefined macro: AM_PROG_CC_C_O
configure.ac:73: error: possibly undefined macro: AM_ICONV
configure.ac:75: error: possibly undefined macro: AM_CONDITIONAL

Let’s try this

> ./makeconf.sh
Running libtoolize…
libtoolize: putting auxiliary files in '.'.
libtoolize: copying file './ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
Running autoreconf…
configure.ac:10: installing './compile'
configure.ac:5: installing './config.guess'
configure.ac:5: installing './config.sub'
configure.ac:6: installing './install-sh'
configure.ac:6: installing './missing'
example/Makefile.am: installing './depcomp'

Cool, seems to work well

./configure
make
make install

Okay… now let’s try

>s3fs bucket /mount/point
fuse: failed to open /dev/fuse: Operation not permitted

Damn…

I’m going to use aws-cli instead. From https://gist.github.com/keeross/b19b6e5603c78073d58b

aws-cli to backup vesta to Amazon S3

  • Install pip (if not installed already):
$ curl -O https://bootstrap.pypa.io/get-pip.py
  • Install AWS cli tool:
$ sudo pip install awscli
  • Configure AWS cli tool:
$ aws configure
AWS Access Key ID [None]: <YOUR_AWESOME_KEY>
AWS Secret Access Key [None]: <YOUR_AWESOME_KEY>
Default region name [None]: us-west-2
Default output format [None]: json
  • Backup old file, then open and edit Vesta command v-backup-user:
$ cp /usr/local/vesta/bin/v-backup-user /usr/local/vesta/bin/v-backup-user_backup
$ sudo nano /usr/local/vesta/bin/v-backup-user
  • Somewhere in Variable&Function section add:
# AWS arguments
bucket_name=<your_awesome_bucket_name>
  • Search for # Creating final tarball section and right after this line chown admin:$user $BACKUP/$user.$date.tar add new code:
# AWS UPLOAD
aws s3api put-object --bucket $bucket_name --key $user/$user.$date.tar --body $BACKUP/$user.$date.tar --storage-class REDUCED_REDUNDANCY
echo -e "$(date "+%F %T") AWS: Uploaded and backed. $user.$date.tar"
msg="$msg\n$(date "+%F %T") AWS: Uploaded and backed. $user.$date.tar"

Error mapping file into docker container

Specifically, for Docker on Windows, Virtualbox version, in conjunction with WSL. When I run ‘docker-compose up’, I get this error

ERROR: for 125f20ccead0_lempr_web_1 Cannot start service web: OCI runtime create failed: container_linux.go:348: starting container process caused “process_linux.go:402: container init caused \”rootfs_linux.go:58: mounting \\\”/d/Code/lempr/web/nginx.conf\\\” to rootfs \\\”/mnt/sda1/var/lib/docker/aufs/mnt/065f6aacb3ee01072637ae0544f50a9b772df662ea8e7e3dc2d663d87d7e1a4f\\\” at \\\”/mnt/sda1/var/lib/docker/aufs/mnt/065f6aacb3ee01072637ae0544f50a9b772df662ea8e7e3dc2d663d87d7e1a4f/etc/nginx/nginx.conf\\\” caused \\\”not a directory\\\”\””: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type

I checked to make sure nginx.conf exist both outside and inside the container.

Then I tried to map the file to a different location, it gets created as a directory inside the container!

I tried various suggestions from the internet:

  • Restart the host machine
  • Map /mnt/c to /c
  • Share the /c drive to the docker-machine VM inside virtual box
    • To do so, open virtualbox
    • Select the ‘default’ machine
    • Go to settings
    • Go to ‘shared folders’
    • Click add
    • Add C and D as shares
    • Attach a console to the machine to make sure the share works

To no avail. Meanwhile the same docker-compose works just fine on my Mac.

So I concluded WSL mapping and docker for Windows’ mapping can’t handle this use case at all and I should switch to a *nix OS for the task.

Windows seems to be unable to handle this setup

Docker-Machine (in Virtualbox) -> Windows -> WSL

It seems there’s a problem with mapping paths from the docker-compose VM to Windows, not a problem with WSL, since trying within Docker’s MINGW command line yields the same result as WSL’s.

 

How to set up squid

On Ubuntu

Basic squid conf

/etc/squid3/squid.conf instead of the super bloated default config file

auth_param basic program /usr/lib/squid3/basic_ncsa_auth /etc/squid3/passwords
auth_param basic realm proxy
acl authenticated proxy_auth REQUIRED
http_access allow authenticated

# Choose the port you want. Below we set it to default 3128.
http_port 3128

Setting up a user

sudo htpasswd -c /etc/squid3/passwords username_you_like

and enter a password twice for the chosen username then

sudo service squid3 restart

How to install tinyproxy, the comprehensive guide

You should build from source, since the latest version on Ubuntu repository doesn’t support authentication yet

git clone https://github.com/tinyproxy/tinyproxy.git
sudo apt-get install automake cmake asciidoc 

cd tinyproxy
./autogen.sh
make && make install

Add authentication

vi /etc/tinyproxy.conf
-----
BasicAuth user password
Allow your.local.ip.address
-----
sudo /etc/init.d/tinyproxy restart