With my recent move to working for Couchbase I’ve also made the leap from Big Data & SQL Analytics to NoSQL and document databases. I’m going to ease into blogging more about it here (and in my channel on the corporate blog) so for the first step let’s go through a few basic concepts. Then we’ll dive directly into an ultra simple Python example of storing and accessing a JSON document in Couchbase.
This is somewhat stream of consciousness (okay it’s a total ramble!) but I just want to get your juices flowing and hear what other areas you’d be interested in.
Growing Beyond NoSQL
When NoSQL terminology was invented it, somewhat derogatorily, referred to alternatives to the standard SQL databases. Oddly enough the growth of NoSQL solutions has pushed the envelope so far that it has stretched to encompass much more than initially imagined.
For example, even the SQL database vendors have document management capabilities in many cases now, so what does that mean?
Data Platform > NoSQL
This is why Couchbase refers to their offering as a Data Platform, not just a NoSQL database as it doesn’t mean anything anymore.
Many users came to use NoSQL for simplified access to binary objects in a database-like way. I.e. you’d specify a key/ID and get back the relevant object for your application. This is often the case for running caching layers between a database and a web app.
Flexibility With Structure
The most interesting part of the evolution is the ability to access specific parts of the JSON document in the database. But wait, isn’t this why we got away from SQL databases? Yes and no.
SQL databases depend on a static schema, often highly normalized, which introduce more complexity than many web apps really want to deal with. Need to add a new field or table to your application? The schema changes can be onerous to handle as they trickle down through database views, middleware query design and ultimately to the end user experience – all layers need updates.
Back to SQL
To further complicate the NoSQL moniker, there are real strengths to being able to use SQL to aggregate records of data. The NoSQL Query Language (N1QL) allows us to query across sets of JSON documents in the database. It’s pretty sweet to use actually! If there are documents without the given fields in them, they are just ignored.
I’ll show you more of this in the future, though there is a lot on our website about it already.
So that’s the background, now let’s do a simple example…
Python NoSQL Access to Couchbase Server
- Install Couchbase Server – some instructions here if needed.
- Ultra simple for OS X users, just launch a DMG – get the enterprise trial for latest features
- It’s also very simple with Docker (user/password is Administrator/password):
docker run -t --name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase/server:sandbox
These drop you into a web management console at http://localhost:8091.
You can pretty much accept all the defaults for the purposes of this walkthrough, just make sure you give it enough RAM if prompted. The Couchbase icon should be in your desktop toolbar – with a handy link to stop and relaunch the web console.
Couchbase Python Packages
I’m on a Mac so will stick with instructions here. There are only two simple commands you need to get started with Python and Couchbase. Get the libcouchbase (C libraries) and the Couchbase Python modules:
brew install libcouchbase
pip install couchbase
Test that it’s installed by launching Python and importing Couchbase:
Now let’s do something more interesting.
Basic Couchbase Python Document Example
As an aside, you may have noticed that Couchbase uses a term called Buckets – a loose collection of documents. There is really no limitation to what kinds of documents to put in the same bucket – so for now think of them as a traditional database table. Some documents will have some fields, some documents will not. It’s a bit of mind-bender but there isn’t a much easier way to explain it without just trying it out.
There is a bucket called default that is installed, we’ll use it for now. We’ll connect to it, then take a Python list and save (set) it to the database with a custom ID. Then we get the document back as a list and I show how you can easily iterate over it.
>>> from couchbase.bucket import Bucket
>>> db = Bucket("couchbase://localhost/default")
>>> vip_people = ["me", "Myself", "I"]
>>> db.set("tyler::friends", vip_people)
>>> mydoc = db.get("tyler::friends").value
>>> for d in mydoc: print d
The cool thing is that most use cases aren’t much different than this! Get, set and operate on elements of a JSON document.
You can open the web console and see the document sitting in the default bucket. Just search for the document ID I used: tyler::friends.
There are still a ton of other options that I’ll touch on in a future blog post – for example, getting the server to do all the work of managing collections frameworks in .NET or Java.
Want to learn more? Leave a comment and I’ll go in that direction! In particular I’m planning to show more about doing NoSQL operations in Spark, Zeppelin and Apache Kafka. Interested?
See my conference presentation video for hints on what will come next 🙂
- Diving into #NoSQL from the SQL Empire … - February 28, 2017
- VID: Solving Performance Problems on Hadoop - July 5, 2016
- Storing Zeppelin Notebooks in AWS S3 Buckets - June 7, 2016
- VirtualBox extension pack update on OS X - April 11, 2016
- Zeppelin Notebook Quick Start on OSX v0.5.6 - April 4, 2016
- Spark Analysis of Global Place Names (GeoNames) - January 20, 2016
- Serverspec checks settings on a Hadoop cluster - December 8, 2015
- Hadoop Options for SQL Databases - October 15, 2015
- “Big Data” off 2015 Hype Cycle? - August 18, 2015
- Spatial Data Made Useful - May 29, 2015