Serverless Structured Logging – Part 1 – The infrastructure

Introduction

We all have to deal with logging. Whether it’s in a distributed environment or involving multiple software components, it’s usually a big task.

And it’s not just the logging part – you also have to analyze the logs to get certain metrics or insights. To help this analytical part of the job, companies are adopting the structured logging approach where log events are not only plain text messages but structured data of some kind like JSON, Avro, Protobuf, etc.

This article describes a structured logging system, which requires no servers, aligning with the increasingly popular serverless approach for solving data-intensive problems. It’s easy to create such a system with AWS (Amazon Web Services). As we will see, this system scales automatically and the cost is linear to the usage, starting from $0.

We are also offering an open-source script and logging library below, which automates this job, enabling the developer to create a logging system in seconds and to analyze the logs with SQL or any other analytical tool. In Part 1, we’ll explain the architecture and the usage. In Part 2, we’ll show how logs can be analyzed without the need of server. In Part 3 we’ll build a real-time analytics component for the logging system.

Architecture

We utilize two AWS services for the logging system:

  1. Kinesis Firehose is a delivery stream, buffering, optionally compressing and encrypting the data and delivering it to S3, the only other service we’ll use.
  2. S3 is a cheap storage solution from AWS, which is able to scale infinitely.

Firehose can be configured to deliver data to S3, Redshift (which is an AWS data warehousing service) or to Elasticsearch. We chose the S3 destination because of it’s pricing and the our usage pattern outlined below. A logging system’s data volume might grow rapidly. With S3 the cost is very low even when you have TBs of data. The access pattern for log analysis is either ad-hoc, querying the data whenever we need an answer, or real-time, usually powering alerts or real time analytical needs. S3 provides great long term storage, which could power ad-hoc queries too. With Kinesis Analytics we also have a real-time analytical option whenever needed. We will discuss analytics in our upcoming blog posts.

 

The overview of this infrastructure – as simple as possible:

Creating the logging infrastructure

 

You could manually create this infrastructure using an AWS Management Console, but we wrote a tool to automate this process. We open-sourced it on Github to make life easier for all of you.

After installing the dependencies, the bootstrap.py sets up the necessary resources:

The required parameters are the following:

  1. AccountID
    Necessary for the policies created as needed. The program checks the current user’s account ID and sets it as the default.
  2. Bucket name
    The name of the bucket that will store the logs
  3. Stream name
    The name of the Kinesis stream. This name will be used to initialize the logging.
  4. Role name
    The name of the IAM role attached to the Firehose stream. If this role already exists, it will be used, otherwise it will be created.
  5. Log rotation period
    The time until Firehose buffers log entries. If it’s 300 seconds (5 minutes) log entries buffered will be delivered to S3 every 5 minutes. If the buffer size was not reached, see below.
  6. Log buffer size
    What is the maximum size for buffering? If it reaches the given size before the rotation period expires, it will be written out.
  7. Compression
    Compression of log files. Available options: gzip (default), snappy, zip or uncompressed.
  8. S3 bucket prefix
    What prefix should be used to store the files inside the bucket? The default is raw/. Please note, that if the prefix ends with / then it will appear on the S3 UI as a prefix. If it doesn’t, it will be part of what appears as the filename. (It only appears as such, because on S3 you only have a bucket and key for each object. Prefix is always part of the key.)

 

If we provided the necessary parameters above, the script creates the infrastructure. The Firehose delivery stream needs a minute to change its state from Creating to Active. You can check it in the AWS Management Console.

 

Once it’s displayed as Active, you have a working logging backend. Not hard, right?

 

You can check the created resources on the Management Console.

The bucket on S3:

The Firehose delivery stream:

The details of our Firehose delivery stream:

Sending test data

There are two ways to test this logging system. You can either integrate our client logging library into your Python application or use the tools.py script to send test data to the pipeline.

test.py

We provide a test.py, which sends test messages to the logging system. The name of the logging system is the name of the stream. In our example it’s datapao-logging.

After we sent it to our newly created datapao-logging system, we can see the log delivered to the specified S3 location.

Client library

We provide a python webserver as an example of integrating the python client code into your own application. This webserver logs the content of the incoming requests with the status code the server returned. Logging this is straightforward:

First, create the logger:

logger = KinesisLogger("datapao-logging")

Then log the requests. If everything was OK:

logger.info({"name": name, "age": age, "code": 200})

In case of an error:

logger.info({"name": name, "code": 400})

For further details, please see the source code.

 

This statement creates the following logs on S3:

{"name": null, "code": 400, "Level": "INFO", "Timestamp": "2017-11-20T13:47:44.982625"}
{"age": "12", "name": "brian", "code": 200, "Level": "INFO", "Timestamp": "2017-11-20T13:48:22.960232"}
{"age": "74", "name": "clara", "code": 200, "Level": "INFO", "Timestamp": "2017-11-20T13:48:55.361036"}

The client library automatically attaches a Timestamp and a level attribute.

Cost analysis

Apart from the lack of maintenance this system requires, another advantage is its price. The cost is low and it grows with the data volume. Let’s see an example.

Say, we have 2KB messages on average, 2M per day. On an average month of 30 days, it will cost us $4.11.

If the volume grows and we go up to 40M messages a day, the cost grows to 82.2$. That’s for 40 million messages per day! That’s very competitive and we haven’t even mentioned that the setup of the whole system took only a minute. Another advantage on the cost side is that if we set up a logging system and won’t use it later , it will not cost us a cent. Furthermore, as we don’t have to spend on maintenance, the TCO (Total Cost of Ownership) is even lower in comparison with other solutions.

Scalability

The other advantage is scalability. As shown later, this system scales automatically without human intervention or further development needs.

The scalability of a system is determined by its weakest component. We have two components: S3 and Firehose. Both services scale automatically, but in the case of Firehose, the default throughput limit is 5 000 records/second, and 5 MB/second. 5 MB/second amounts to 421 GBs a day, which is a lot, but if you need more, all you have to do is open a limit increase ticket. Beyond this default limit, the scalability is infinite in theory, but we have no experience with such high volumes, where these services would fail to scale up.

Epilogue

With AWS services it’s easy to set up a managed logging system, which requires no maintenance or running servers. This is important because it means you have less room to err and you spend less resources on maintenance. We also showed that this infrastructure design is infinitely scalable with almost no user interaction.

In Part 2 we will analyze the stored logs with simple SQL, using Athena.