Getting started with Apache Cassandra in Python

| | ,

Home » Getting started with Apache Cassandra in Python

Apache Cassandra is a highly scalable database with tunable consistency. It is a NoSQL database and it is used by large organizations in varied application domains. Getting started with Apache Cassandra in Python can prove to be an intensive and time-consuming task for any developer or data scientist. Cassandra has a steep learning curve and it comes with an array of tools and configurations. This blog covers what Apache Cassandra is, how it is different from relational databases, and ways in which it is better. We’ll go over a simple example made using Python of what it would take to set up Apache Cassandra and a sample application so that you can play around with it on your own.

We’ll walk you through setting up Python, as well as DataStax Astra – a Cassandra-as-a-Service application that’s hosted on any cloud, for free. We’ll show you how to connect Python to Cassandra and insert data with the Datastax ODBC driver, and we’ll use the Astra web console to query data stored in Cassandra using the CQL console.

What is Apache Cassandra?

Cassandra is a popular database technology. Cassandra was originally developed by Facebook when they decided to scale their business beyond the limits of MySQL, a popular database technology at the time. Cassandra is highly scalable and can handle multiple operations with ease. Cassandra is built in Java and has over 200 operators. Cassandra is a distributed database technology, which means that the information is spread over multiple nodes. Cassandra uses no SQL but instead has a flexible data model and a wide range of operators that allow for complex data manipulation. Cassandra is supported on many platforms. Cassandra has been the “people’s NoSQL” alternative to the more expensive and hard-to-scale databases like Oracle, MongoDB, and Amazon DynamoDB.

Prerequisites

Before we get started, you should first sign-up for a DataStax Astra account, which you can do by going to their website and fill-up the form. After that, you’ll have to install some modules for us to create the example application, we will focus on installing only the necessary software requirements for our Python app to run.

As mentioned above, we will be using DataStax Astra which is a managed Cassandra-as-a-Service application, and some other modules as well. We can install them all at once by creating a requirements.txt file, just put the following line in that file and install them using this command: pip install -r requirements.txt.

cassandra-driver==3.25.0
numpy==1.19.3
astrapy==0.0.2
simplejson==3.17.2

Setting up a Cassandra Database

Cassandra is free and open-source software, which means that it’s free to use – you don’t need to pay to deploy Cassandra on your infrastructure. If you are familiar with servers and databases and you are interested in doing your installations and configurations, you can find detailed instructions in the documentation; but all of this process can be intimidating, especially for first-time users. Alternatively, you can deploy Cassandra on the Cloud, and that’s what we are going to do here.

The best way to get started with Cassandra for beginners would be through a managed Cassandra database which is available on the web. Datastax Astra is a database-as-a-Service that you can launch with the click of a button. Datastax’s cloud solutions are built on Apache Cassandra and they offer generous free tier quotas of up to 20 million writes/month and 80GB of data storage. You can use them as much as possible by trying out the services for free or start managing your application.

We’ll first have to get our Astra DB IDs and generate an application token. To do that navigate to Datastax Astra homepage and press the create a database button from the sidebar and fill out the form as shown below:

You can choose any hosting services you want from the three i.e. Google Cloud, Amazon Web Services, and Microsoft Azure. All of them are free and require no setup at all, I have chosen Google Cloud because only it serves the Indian region. After filling out the details, click on the Create Database button and you should be redirected to the home page, from here click on the database you just created from the sidebar.

Now from the database page click on the connect tab from the navbar.

Now on this page, from the Connect using an API >> Document API section copy the environment variables that you need to export, from the right part of the page. 

And to get the app_token we can go to Organization Settings >> Token Management or click on the link given in the first point of the prerequisites.

Here we can get our app_token by choosing a user role and it will give out the permission accordingly. Since we are only making a small demo application and won’t be using it much for anything, so we can just choose the Read/Write User role. You can also see role permission below so you can choose something else easily if you wanted to. After selecting the role, press the Generate Token button. 

If the tokens were successfully created, you should get a result like this:

And, as you can see we got our Token as well as a Client ID and Client Secret, make sure to download and keep these in a safe place as you won’t get to see them again from the website. Also, put the Token value in ASTRA_DB_APPLICATION_TOKEN, so then our IDs would look like this:

export ASTRA_DB_ID=57482bbd-c72e-4ba4-afdc-7c4e230701c4
export ASTRA_DB_REGION=asia-south1
export ASTRA_DB_KEYSPACE=cassandra_keyspace
export ASTRA_DB_APPLICATION_TOKEN=AstraCS:ulDBRyZovIyoTxOhNMxuXjxH:5a578dd322cd88cd33bce6513acdb89694572d669a0ddd0a08eacde662af9ecd

This complete Document API allows you to store JSON documents in the Astra DB you created, without a schema meaning no data modeling is required! 

Creating a small application 

Once you have done the above steps, we can start to build out a very basic application to experiment with. Firstly we will authenticate with Datastax Astra using token authentication. And once we have our HTTP client object, it is passed to the next method in which we created and inserted a JSON document into an Astra collection. The code can be written as follows:

import os, uuid
from astrapy.rest import create_client, http_methods

def getAstraHTTPClient():
    """Enter your own IDs here"""

    ASTRA_DB_ID = "57482bbd-c72e-4ba4-afdc-7c4e230701c4"
    ASTRA_DB_REGION = "asia-south1"
    ASTRA_DB_APPLICATION_TOKEN = "AstraCS:ulDBRyZovIyoTxOhNMxuXjxH:5a578dd322cd88cd33bce6513acdb89694572d669a0ddd0a08eacde662af9ecd"
   
    # setup an Astra Client
    return create_client(astra_database_id=ASTRA_DB_ID,
                        astra_database_region=ASTRA_DB_REGION,
                        astra_application_token=ASTRA_DB_APPLICATION_TOKEN)

def createJSONonAstra(astra_http_client):

    doc_uuid = uuid.uuid4()
    ASTRA_DB_KEYSPACE = os.environ.get('ASTRA_DB_KEYSPACE')
    ASTRA_DB_COLLECTION = os.environ.get('ASTRA_DB_COLLECTION')

    astra_http_client.request(
        method=http_methods.PUT,
        path=f"/api/rest/v2/namespaces/{ASTRA_DB_KEYSPACE}/collections/{ASTRA_DB_COLLECTION}/{doc_uuid}",
        json_data={
            "book": "The Hunger Games",
            "author": "Suzanne Collins",
            "genre": ["fiction"],
        })

Now to send this data to Astra, we need to execute this file which could be done with:

python3 main.py  #Linux/Mac python main.py  #Windows

If it doesn’t return any errors that the operation was done successfully. Finally, we can confirm that we can get the document inserted successfully. We can do that by issuing a curl command to retrieve it through the command line. 

curl --request GET \
--url https://$ASTRA_DB_ID-$ASTRA_DB_REGION.apps.astra.datastax.com/api/rest/v2/namespaces/$ASTRA_DB_KEYSPACE/collections/main\
-H "X-Cassandra-Token: $ASTRA_DB_APPLICATION_TOKEN" \
-H 'Content-Type: application/json' 

Note: It may take some time to get the results back. 

If everything worked properly you will see a result like this:

{
"pageState": "3Bykb2N1bWVudElkIjoiNTNhMzRmYzItZjg1ZC00NWE4LTgwNmQtYTJkMTk0MDA0ZmYxIiwiaW50ZXJuYGsDdASdjh2q3IifQ==",
  "data": {
    "53a34fc2-f85d-45a8-806d-a2d194004ff1": {
      "author": "Suzanne Collins",
      "book": "The Hunger Games",
      "genre": [
        "fiction"
      ]
    }
  }
}  

Conclusion

Apache Cassandra is a free and open-source distributed database management system that is widely used by enterprises, startups, and small businesses alike. Cassandra is extremely scalable and is capable of handling trillions of records with very low latency. In this blog, we have explored the steps to get started with Cassandra in DataStax Astra and used it to get and retrieve some data. These were the most fundamental features and we hope you learn to make use of these steps. But don’t stop your journey there! Although Cassandra is a great database technology, its documentation and applications are also powerful tools in themselves. So if you need more information, Cassandra’s manual pages and downloadable toolkits are the best places to look for help.

Spread the love
 
   
Previous

How to Publish Python Package on PyPI

Leave a Comment