Steve Hostettler

Microservice Architecture - Part 3 (Diving into microservices)

2019-03-04T22:51:20+01:00

Introduction

In Part 1, we configured, compiled, and deployed microservices. In Part 2, we configured and deployed non-functional services such as security, and logging. In this chapter, we will dive into the concept of microservices. First, we will discuss why the community came with yet another architecture paradigm. Secondly, we will look at some definitions and the main properties of such an architecture. Then we will detail some of the technologies, microservice architecture leverage to deliver maximum value. After that, we will analyse the pros and cons of this architecture. Finally, we will discuss some of the architecture patterns related to microservices.

Why (yet) another architecture paradigm

The microservice architecture can be seen as a reaction to the monolith architecture. In particular, the fact that the bigger the application, the slower the pace of change. A monolith application is self-contained. Modules mostly communicate through method calls. While monolith do present some advantages in terms of performance (not scalability) and consistency for instance, they have a tendency to evolve to complex “thing” that run out of control. Modifying a monolith requires to rebuilt it completely and to ship it in one block. Furthermore, by its very nature, a monolith tend to favor API leaks and to decrease modularity. Finally, having one block means that scalability happens at that granularity which might lead to a waste of resources as all the modules will be scaled at the same time whereas not all the modules might need the same type of scalability or no scalability at all.

Following the model provided by M. L. ABBOTT and M.T. FISHER [1], scalability follows three dimensions:

X-axis : horizontal duplication
Y-axis : functional decomposition
Z-axis : data partitioning

While X and Z scalability is feasible with a monolith, Y (functional) scalability is not. It is all or nothing.

The below figure presents a typical monolith architecture. Please note that,

usually a monolith is implemented as a multi-tiers architecture
monolith does not mean that it is not modular. It means that it is a “monolithic” unit of deployment. Everything is deployed together.
cross-cutting concerns at shared and the monolith instance level
inter service (modules) communication is usually method calls (vs inter process communication)
scalability happens at the monolith level

  
  Monolith architecture

Breaking the monolith apart in small pieces, enables to make architectural decision at that level. For instance, to choose the most appropriate programming language or technology for each task. That being said adding heterogeneity and increasing the number of deployment units, comes at a price in terms of necessary infrastructure and thus complexity.

Definition(s)

At this core, microservice is an architecture (MSA) that is an evolution of the service oriented architecture (SOA). It inherits from SOA several key concepts. The most important one being that business added value is delivered by combining a collection loosely coupled services.

Reusing the definition of services from SOA, a service has the following properties:

It logically represents a business activity with a specified outcome.
It is self-contained.
It is a black box for its consumers (and the communication between consumer and provider is formalized by a contract).
It may consist of other underlying services.

These properties apply to microservices as well. The main difference between SOA and MSA is in the granularity of the services and some opinionated implementation choices. A key point to understand is that microservices have not been invented in isolation, they emerged alongside other game changers such as Agile, DevOps concepts, and Cloud computing.

Adding these influences to the mix, adds the following properties:

each service is expected be elastic (horizontal scalability), resilient (failover), composible, minimum, and self-contained.
each services must support automation, deployment, and testing as first class citizens.
each is specialized in one thing and in doing that thing right

The term microservice has been coined around 2012 [2][3]. People trace it back to a workshop in Venice in 2011 but I was not able to find the proceedings.

  
  Microservice architecture

Technologies

Two main technologies are usually linked to microservices:

Cloud native technologies

Due to their distributed nature, Microservices call for a distributed way of deploying and management. Therefore, they are inherently linked to cloud native technologies such as containers and containers orchestration. As a matter of fact, without these technologies developing, building and deploying a microservice architecture would be so tedious and cumbersome that it would quickly collapse under its own weight.

Docker and Kubernetes bring the ease of deployment and management that is required to deal with hundreds of units. You can see as Kubernetes as the Operating System of the cloud and as Docker as the process of the cloud. Thanks to Docker, developer do not have to deal with OS/platform specific configurations. If it works in the container (mostly Linux) it will work in the cloud. Configuration can be scripted so the deployment is repeatable and automated. In association with Kubernetes, one can easily manages the elasticity, and health of the microservice ecosystem. Not to mention that a number of 3rd party tools are available as images that can be composed at will to provide crucial services such as logging, authentication, authorization, and monitoring.

Message broker

Another technology often associated with microservices is the message broker. Message Brokers are basically bus that can exchange messages at a very high speed in a distributed and elastic way. At the moment, Kafka is the most well-known of these. It is very often as a communication layer between the microservices. Kafka helps dealing with maintaining consistency by propagating messages in a asynchronous and transactional way. Message brokers are not a new concepts and they can be associated with the good old Enterprise Service Buses (ESB) of the Service Oriented Architecture era. The main difference is that, in order to avoid the ESB antipattern, message broker adopt a more lightweight approach. The big selling arguments around ESBs was that they would allow proper composition, discovery, and monitoring of the services as well as message and protocol transformation. The problem was that, at the end, all the business logic of the company was contained in the bus and thus it was extremely difficult to maintain. Microservices over Message Brokers take a more decentralized approach by letting the responsibility of the transformation, and composition to microservices.

Additional properties of microservices

Some of the properties of microservices are not inherited from the definition but rather from some implementation decisions. Please note that some of these properties might sound counter intuitive at first but they emerged to solve practical problems. Especially to limit service coupling.

Single Database per service

Sharing databases across multiple microservices increases coupling. Changing a database model for one service might impact other services. Furthermore, depending of the usage you might prefer a good old database or a key-value store. Having one database per service solves these problems at the expense of maintaining more technologies, instances and models.

Low cross-service reuse

This is, in my opinion, the most counter-intuitive thing. We have been told for years to reuse and to not duplicate code. And here it is, code duplications is promoted. More specifically the best practices is to not create “common” libraries. I would not be so “extreme” and simply say not to create common libraries with shared business code (for instance NO DTO).

One Domain per service

This is the most intuitive of the rules, restrict your micro services to deal with one and only one business domain. For instance, do not mix services for sales and for accounting. If within accounting, you have two accounting standards, then let’s have two services.

Service Granularity

Choosing the right granularity for your services is more an art than a science. There are numerous articles out there to help you choose the right granularity [4] [5]. Finding the right level of granularity is usually a tradeoff between thus between maintainability and scalability on one hand, and deployment complexity and performance on the other hand. It is always a choice between maintainability at the micro level (the service) and the macro level (the whole ecosystem).

Although it is difficult to predict the actual penalty of microservices on the performance as it depends of the use case, [5] predicts a 10% penalty per hops (microservice to microservice communication) on the total roundtrip. I like the breakfast example where a single macro service called prepare breakfast ends up being 3, 12, or 60 services depending of the level of granularity. Now imagine, we need to scale the 60 services to have … say 3 instances. All of a sudden you have 180 instances to maintain and to manage. This is becoming exponentially more complex. In that example, the right level is probably 12 as it proposes some valuable reuse and still limit the complexity. That being said, nothing stops you to have different level of granularity per domain depending of the expected evolutivity, reuse, and performance. As a rule it is better to start with coarser services and to go granular on a case by case basis.

There are recommendations out there that a microservice should between 50 to 500 lines of codes. This is, in my opinion, the worst possible metric out there. It is way too dependent of the language and technology.

The most important rule about the granularity is to respect the service boundaries:

It doesn’t share database tables with another service.
It has a minimal amount of business entities,
It is stateless (and if stateful it is on purpose).
It does take data (un)availability into accounts. For instance by implementing local caches for non-managed entities.
It is the single source of truth for the business entities it manages.

Architecture patterns

Command Query Responsibility Segregation (CQRS)

Command Query Responsability Segregation is all about … guess what … reducing coupling. The idea is to have a different API for querying data and for creating/updating them. At its core, it seggregates the model and the storage to query data from the model and storage that keeps the single version of truth of a particular business entity. The two APIs can be part of the same microservice or in two different microservices. One use case to have it on two separate microservices is to be able to scale out querying (or vice versa).

As each microservice can have its own store, then you could imagine using Cassandra (which is known to be very efficient in writing) for persisting and Elasticseach for read (which is known to be very efficient in reading). Command and Query Responsibility Segregation (CQRS) was first introduced by Greg Young [6] and is itself an evolution of the Command Query Separation (CQS) by Bertrand Meyer [7]

In the sample architecture, the instrument service has one view of the instrument model but the valuation service as another one. To be fully CQRS compliant, the instrument service should have had a different model to persist and to query. As JPA does not support two entities on the same database object, we would have to use a constructs like JPA queries and select new to support a different model for persistence and for reading. For more information on how to use the select new``, please refer to this article and this article.

CQRS is very associated with Event sourcing (see below).

  
  CQRS + Event Sourcing architecture

Event sourcing & Message Bus

In an architecture where neither the data store nor the data model is supposed to be shared, keeping the consistency between the different stakeholders is challenging. Having distributed transactions that span across multiple services is difficult to implement and even more difficult to maintain. That is why, instead of looking for consistency at any point in time, we are looking at eventual consistency. Of course, eventual consistency might be a problem in some use cases and that must be assessed on a case by case basis.

The fundamental concept behind Event Sourcing is that All changes to an application state are stored as a sequence of events [8]. In other terms, changing the value of a given field for a given entity is stored as a message. Ultimately, you can reconstruct the current state by replaying all the messages starting with the initial state.

In the sample architecture (see below), the instrument service is writing any changes to a message broker that is distributed to all microservices that need it. In this case, to the valuation service and to the regulatory reporting service. Instrument service has the single version of the truth and all other service will eventually be consistent. Of course, internally the instrument service must guarantee that updating its own store and the bus is made transactionally.

Since the changes are stored in the bus, they can be replayed. For instance, when a new instance of a service joins so that it can update its internal state.

API Composition / API Gateway

As we already discussed, non-functional concerns such as load-balancing, versioning, security (authentication, authorization, TLS termination) must be managed at the microservice level.

Furthermore, maximizing microservice reusability implies granular services. The client could easily compose the microservices but this means a high coupling between the client and the services as well as a lot of traffic between the client and the microservices.

One way of mitigating this, is to add an API gateway in front of the microservices to avoid direct coupling between the client and the individual services. Besides, it will handle composition locally in the application network. Finally, it will apply cross-cutting concerns uniformally on all the requests.

The sample application relies on an application gateway to deliver authentication, versioning and TLS termination (see figure below)

  
  Network topology and high level component view of the micro-service architecture

Pros and Cons

Let’s reflect on the pros and cons of the microservice architecture. Like any architecture, it is a tradeoff between a set of non-functional requirements.

Pros

Fist the positive aspects

Scalability,

Scalability (both horizontal and vertical) can happen a very granular level. Therefore, no resources are wasted on scaling components that do not require it (like it would be the case for a monolith). Furthermore, because it is easy to segregate stateful from stateless components, scalability happens most of the time on pure stateless microservices. In a monolith, one stateful component forces the whole monolith scalability to take stafulness into account.

Failover, Fault Tolerance, High-Availability

In a very similar way to scalability, Failover, Fault Tolerance, and High-Availability can be targeted to the components that require it. Similarly, statefulness can be limited to the few services that require it (if any).

Time to market, Adaptability

Because the services are small and loosely coupled, they can be changed and deployed with limited risk of regression to other services. This increases time to market and is a big step forward continuous delivery.

Team Independence

Similarly to the time to market, the loose couple implies low inter-team dependency. That being said, this is sort of by product to the microservice architecture. A modular monolith should in theory achieve the same level of loose coupling. The problem is that when calls internal (rather than through the network), developer have a higher tendency to break module boundaries thus increasing the coupling.

Technology Adaptability

Again, loose coupling and low dependency enact new practices such as deciding which technology to choose on the case by case basis. That being said, depending of your organization, this can be more of a curse than a blessing. You can very quickly end up with exotic technologies and languages that only the microservice creator knows.

Reusability

Thanks to the granularity, services are much more dedicated to a particular task. This is favoring reusability by composition.

Cons

Like any architecture, there is no free lunch:

Increased resource consumption

As a microservice architecture entails many more instances (e.g., VMs, JVMs) to run that its monolithic counterpart. Furthermore, entities are very often replicated between the instances (to increase loose coupling). All of that lead to higher overall resources (memory and CPU) consumption. This is compensated by the fact that more resources are available than years ago before the cloud era.

Operational Overhead / Deployment complexity

The profusion of services and their associated dependencies (DB, message broker, …) can very quickly lead to an operational nightmare. The operation team needs to master the concepts and the related tools for monitoring the ecosystem (Docker, Kubernetes, ELK, …). This has a cost, both in terms of skills and manpower.

Cross-Cutting concerns

Since the cross-cutting concerns are managed at the service level, it can significantly complexify deployment : a microservice might be straightforward but let’s add it authentication, authorization, logging, versioning, failover, balancing and it is a completely different story. Doing system testing on such environments can be very challenging.

Architecture Complexity : Distributed system

Microservices are distributed by nature and thus exposes developers to “The 8 fallacies of distributed computing” :

The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn’t change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.

When designing your microservice architecture, you will have to think of all of the above. Another interesting law to remember is the [9] :

Fowler’s first law of distributed computing : don’t distribute your objects

While Fowler’s rightfully insists in a this post that there is a huge difference between distributing objects and (micro) services. Therefore, he underlines that his 2004 comment is not in any way applicable to microservice architecture. Although I agree that the two are very different, I think that it is interesting to look at his entire chapter that is available online. Before laying down the law, Fowler spend some time explaining how he came to that conclusion. Most of the argument is that not choosing the granularity of the interfaces (API) will lead to poor performance and will massively increase complexity. This is also very true for microservices.

While there are architecture patterns and tools to deal with them, it just means extra-complexity and therefore you should ask yourself where it does and where it does not make sense.

Eventual consistency

Hereafter a summary of the main “pros” and “cons”. Please note that this is not an absolute evaluation. The weight that you should put in each of these is highly dependent of your use-case and context. For instance, team independence is not a real concern in a small startup with 5 employees. On the other end, operation complexity tends to be less of a problem if you already have skilled devops teams.

Property	Microservice	Monolith
Scalability	+	-
Release/Updatability, Time to marker	+	-
Failover, Fault Tolerance, High Availability	+	-
Team Independence	+	-
Technology Adaptability	+	-
Reusability	+	-
Resources Consumption (like for like for a given throuput/volume)	-	+
Operational Overhead	-	+
Cross-Cutting Concerns (Security, Logging, Caching, Auditing, Configuration, …)	-	+
Architecture complexity (Distribution, Consistency, Governance, Integration testing, …)	-	+

Closing Thoughts

To sum it up : whereas microservice architecture is a powerful tool in your toolbox, it is in no way a silver bullet. Microservice architecture is very adapted when a high level of elasticity is required but it comes at a price in terms of complexity of operations. Architecture is always a tradeoff between non-functional requirements and therefore the first thing to do is to establish these NFRs. If scalability and updatability is not an issue then having a modular monolith is probably more appropriate. Furthermore, there is nothing stopping you to adopt a hybrid approach by starting to break a monolith in coarse services and to refine it as needed.

Bibliography

[1] Martin L. Abbott and Michael T. Fisher. 2009. The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise (1st ed.). Addison-Wesley Professional.
[2] James Lewis. 2013. Micro services - Java, the Unix Way. 33rd Degree Conference, Krakow Poland. http://2012.33degree.org/talk/show/67
[3] Fred George. 2013. MicroService Architecture, https://www.slideshare.net/fredgeorge/micro-service-architecure
[4] Chistian Verstraete, 2017, https://cloudsourceblog.com/2017/01/03/cooking-breakfast-and-microservice-granularity/
[5] Shadija, D., Rezai, M., & Hill, R. (2017). Microservices: Granularity vs. Performance. In UCC 2017 Companion - Companion Proceedings of the 10th International Conference on Utility and Cloud Computing (pp. 215-220). Association for Computing Machinery, Inc. https://doi.org/10.1145/3147234.3148093
[6] Greg Young, 2010. CQRS Documents. https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf
[7] Bertrand Meyer. 1988. Object-Oriented Software Construction (1st ed.). Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
[8] Martin Fowler, 2005, Event Sourcing : https://martinfowler.com/eaaDev/EventSourcing.html
[9] Martin Fowler. 2002. Patterns of Enterprise Application Architecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Microservice Architecture - Part 1 (A running microservice architecture)

2019-02-17T22:51:20+01:00

Introduction

This series of blog posts aims at helping students at the University of Geneva to develop their first application following micro-service principles. Besides explaining the concepts and implementation details of micro-service architecture, we will as well discuss software development practices such as software factories and innovative deployment options such as containers and container composition. All samples and a complete working application can be found here on GitHub

The following diagram represents the end-state of our microservice architecture. From a business perspective, it delivers RegTech services. More specifically, it manages counterparties and financial instruments. It valuates a portfolio and finally provides some regulatory reporting. You do not need deep financial knowledge, sufficient is to say that:

A counterparty is an individual or a company participating in a financial transaction. For more details.
A financial instrument is an asset that can be traded such as stocks, loans, and the likes. For more details.
Portfolio valuation is the action of evaluating the net value of a set of assets. For more details.
Financial institutions must comply to a set of regulations such as delivering monthly reports to state their financial health.

  
  Network topology and high level component view of the micro-service architecture

Besides, these “business” services, the architecture delivers a set of non-functional services such as:

A Central logging mechanism to deal with the distributed nature of the architecture. It relies on a Logspout companion container that sends the logs from all the containers to a concentrator called Logstash that in turn sends them to a database optimized for searching called ElasticSearch. Finally, Kibana provides visualization and analysis of the logs.
A Message broker to increase service decoupling and scalability. Kafka in this case.
An API-Gateway that provides routing, load-balancing and SSO to the micro-services by integrating an identity manager called Keycloack. Furthermore, Kong delivers API-Gateway services (e.g., security, API composition and aggregation) The API-Gateway also shields the user from knowing the ugly details of the network topology. Furthermore, it protects the backend by establishing a clear front vs back network separation, it exposes static resources and finally, it provides TLS termination.

From a technology perspective, Microservices are implemented using JEE 8 microservice and its microprofile. More specifically, Thorntail [3]. Furthermore, microservices are packaged as Docker [1][2] container using Maven [4] as a build tool.

This chapter describes step by step how to compile and deploy the microservices themselves. Part 2 describes how to setup non-functional services such as SSO (Single Sign On), API concentration, and logging. Because of its distributed nature, in a microservice architecture, the non-functional infrastructure is as important than the actual services. Part 3 dives deeper in what a microservice architecture actually is, its benefits and drawbacks, and some details on the related technologies. Part 4 focuses on the software factory, putting everything together and testing the result. Finally, Part 5 does the autopsy of a microservice, detailing the associated design patterns.

Pre-requisites

Note: This series of blog post leverages a lot of different technologies. Please take the time to install everything properly. It will save time later on.

To execute the samples, you will need to install and to configure the following tools:

a “reasonably” powerful computer with Linux (whatever recent distribution) or Windows (min. Windows 10) to support Docker. Mac is ok as well, but it requires some additional steps that will not be described here.
a working Docker environment to deploy the services locally.
a JDK 11 to compile and run the services
a Git client for collaboration
NodeJS and the Angular tooling
Apache Maven for the automation.
a bash interpreter (on Windows you can rely on Git bash that is usually installed with Git)

On top of that you need to have:

An intermediate level in Java
Some basic understanding of OS (including bash scripting) and networking (DNS, TCP, HTTP)
a great deal of patience and coffee

Note: We will start a lot of containers, please grant at least 6GB RAM and 6GB swap to your docker-machine

Getting the backend components to run

First things first, let’s checkout the code and compile everything. Before you start complaining, yes, this section is tedious but we have to have the environment set up before diving into the wonderful world of microservices. Let’s start by cloning the code from GitHub.

$ git clone https://github.com/hostettler/microservices.git

Cloning into 'microservices'...
remote: Enumerating objects: 875, done.
remote: Counting objects: 100% (875/875), done.
remote: Compressing objects: 100% (596/596), done.
remote: Total 875 (delta 279), reused 787 (delta 203), pack-reused 0
Receiving objects: 100% (875/875), 3.68 MiB | 912.00 KiB/s, done.
Resolving deltas: 100% (279/279), done.

Let’s check that Maven and Java are correctly installed.

$ java -version

openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

$ mvn -v

Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T20:33:14+02:00)
Maven home: ...
Java version: 11.0.1, vendor: Oracle Corporation, runtime: ...
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

The next step is to compile the project to produce the artifacts (i.e., binaries) that are required. To that end, we use Apache Maven. Maven is an opiniated build tool:

Opinionated Software is a software product that believes a certain way of approaching a business process is inherently
 better and provides software crafted around that approach.

Namely, following its opinion makes our life easier and requires less efforts. For more information and tutorials please refer to this Maven Tutorial. The output of the build process is a set of “JAR” files (i.e., JAVA library) that are stored in your local ~/.m2 repository for later use.

$ cd microservices/
$ mvn clean install

[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Parent Pom of the Pinfo Micro Services                             [pom]
[INFO] Counterparty Service                                               [war]
[INFO] Instrument Service                                                 [war]
[INFO] Valuation Service                                                  [war]
[INFO] Regulatory Reporting Service                                       [war]
[INFO] API Gateway Service                                                [war]
[INFO]
[INFO] -------------------< ch.unige:pinfo-micro-services >--------------------
[INFO] Building Parent Pom of the Pinfo Micro Services 0.2.0-SNAPSHOT     [1/6]
[INFO] --------------------------------[ pom ]---------------------------------
....
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Parent Pom of the Pinfo Micro Services 0.2.0-SNAPSHOT SUCCESS [  3.216 s]
[INFO] Counterparty Service ............................... SUCCESS [ 42.866 s]
[INFO] Instrument Service ................................. SUCCESS [ 49.720 s]
[INFO] Valuation Service .................................. SUCCESS [ 32.623 s]
[INFO] Regulatory Reporting Service ....................... SUCCESS [ 23.048 s]
[INFO] API Gateway Service 0.2.0-SNAPSHOT ................. SUCCESS [ 22.796 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:55 min
[INFO] Finished at: 2019-02-20T18:09:09+01:00
[INFO] ------------------------------------------------------------------------

Tip: Congratulations, you compiled the microservices.

At this point, you have manage to compile all of the Java code and you have created maven artifacts for each microservice (Java Archives a.k.a. JARs). However, as we continue, we will see in the next chapters that a micro-service architecture is much more than a bunch of micro-services. We will need a lot of additional 3rd party tools and services. These additional services (e.g., logging, security) are usually provided as container images that runs on Docker. To be able to run the microservices along side these “3rd” party tools, we need to package the microservice as Docker images.

Simply put, Docker provides lightweight virtualization. It has a smaller footprint compare to the usual Virtual Machine approaches (Virtual Box, VM Ware). The main difference is that the OS system layer is not replicated in each container but rather shared.

Docker containers run Docker images that are merely lightweight Linux systems with additional softwares. For more about Docker Let’s first check whether Docker is properly installed.

$ docker -v

Docker version 18.09.2, build 6247962

$ docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:2557e3c07ed1e38f26e389462d03ed943586f744621577a99efb77324b0fe535
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

...

Before we continue, let’s have a look at a docker survival kit:

docker ps displays all the running containers.
docker ps -adisplays all stopped container (not running but still using some resources).

Based on that :

we can kill all running containers by running docker kill $(docker ps -q)
remove all stopped containers by running docker rm $(docker ps -a -q)
finally docker system prune cleans up all dangling data.

So the Docker daemon is up and running. Let’s create the Docker images for the microservices. This step will reuse the JAR files created previously and package them along a Linux system so that every image can be run independently.

mvn install -Ppackage-docker-image

INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Parent Pom of the Pinfo Micro Services 0.2.0-SNAPSHOT SUCCESS [  6.114 s]
[INFO] Counterparty Service ............................... SUCCESS [ 59.134 s]
[INFO] Instrument Service ................................. SUCCESS [ 58.533 s]
[INFO] Valuation Service .................................. SUCCESS [ 42.806 s]
[INFO] Regulatory Reporting Service ....................... SUCCESS [ 33.979 s]
[INFO] API Gateway Service 0.2.0-SNAPSHOT ................. SUCCESS [ 14.134 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:35 min
[INFO] Finished at: 2019-02-20T18:58:34+01:00
[INFO] ------------------------------------------------------------------------

All the Docker images for the microservices have been created. Let’s double check:

$ docker image ls | grep unige

unige/regulatory-service       latest    5859668ecfb1        12 seconds ago       778MB
unige/valuation-service        latest    93516633b7b3        48 seconds ago       814MB
unige/instrument-service       latest    b1bded92050c        About a minute ago   814MB
unige/counterparty-service     latest    1789c8543673        2 minutes ago        780MB
unige/api-gateway              latest    b355613b0bbd        32 hours ago         371MB

Tip: You now have Docker images for your microservices. At this point, we have Docker images for the microservices and for the api-gateway.

Let’s start a Docker container with the counterparty microservice and map the port 28080 of the container to the port 10080 of the host. In principle, this will start a Linux OS and then start the microservice as the first process (PID 1). This container provides all the service, you would expect from any Linux system such as network, security, and isolation.

$  docker run --name myCounterpartyService -p 10080:28080 unige/counterparty-service:latest

2019-02-26 22:28:06,527 INFO  [org.jboss.as.server] (main) WFLYSRV0010: Deployed "counterparty-service-0.2.0-SNAPSHOT.war" (runtime-name : "counterparty-service-0.2.0-SNAPSHOT.war")
2019-02-26 22:28:06,569 INFO  [org.wildfly.swarm] (main) THORN99999: Thorntail is Ready

Tip: Open a browser and navigate to http://localhost:10080/counterparies It will display a long list of counterparties.

This demonstrates that a web services is listening on port 10080 of localhost. More specifically, we started a container with the image of the counterparty microservice. The port 28080 is mapped to port 10080 so that we can test it. Furthermore, we named the container myCounterpartyService.

As it is a fully running Linux system, you can connect to the container to inspect it. In another console, we can run a docker ps command to list running containers.

docker ps

CONTAINER ID        IMAGE                               COMMAND                  CREATED             STATUS              PORTS                     NAMES
dfb9acf07d79        unige/counterparty-service:latest   "/bin/sh -c 'java -D…"   42 seconds ago      Up 40 seconds       0.0.0.0:10080->8080/tcp   myCounterpartyService

As you can see, there is one running container name myCounterpartyService that listen on port 10080 of localhost.

Let’s test it by connecting to http://localhost:10080/counterparties/724500J4K3Q60O9QLF45 either by using a browser or the curl command line. counterparties is the context name of the service and 724500J4K3Q60O9QLF45is the id of one particular counterparty we want the details on.

curl -X GET http://localhost:10080/counterparties/724500J4K3Q60O9QLF45

{"lei":"724500J4K3Q60O9QLF45","name":"Ton Smit Onroerend Goed B.V.","legalAddress":{"firstAddressLine":"Van Teylingenweg 126","city":"Kamerik","region":"","country":"NL","postalCode":"3471GG"},"registration":{"registrationAuthorityID":"RA000463","registrationAuthorityEntityID":"52431649","jurisdiction":"NL","legalFormCode":"54M6","category":"","registrationDate":1545264000000,"lastUpdated":1545264000000,"registrationStatus":"ISSUED","nextRenewalDate":1576800000000},"status":"ACTIVE"}

We can stop the service as follow:

docker stop myCounterpartyService

And check that nothing is running anymore:

docker ps

CONTAINER ID        IMAGE                  COMMAND          CREATED     STATUS       PORTS          NAMES

So far we only ran one service, to run all the microservices (plus the message broker) we will compose the images by using docker-compose. docker-compose is a way to script a series of complex Docker configuration to provide a coherent ecosystem.

cd docker-compose/
docker-compose -f docker-compose-microservices.yml up

instrument-service    | 2019-02-28 07:55:54,602 INFO  [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] (EE-ManagedExecutorService-default-Thread-1) [Consumer clientId=consumer-1, groupId=pinfo-microservices] Successfully joined group with generation 14
instrument-service    | 2019-02-28 07:55:54,606 INFO  [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] (EE-ManagedExecutorService-default-Thread-1) [Consumer clientId=consumer-1, groupId=pinfo-microservices] Setting newly assigned partitions [instrumentsReq-0]
valuation-service     | 2019-02-28 07:55:54,604 INFO  [org.apache.kafka.clients.consumer.internals.AbstractCoordinator] (EE-ManagedExecutorService-default-Thread-1) [Consumer clientId=consumer-1, groupId=pinfo-microservices] Successfully joined group with generation 14
valuation-service     | 2019-02-28 07:55:54,611 INFO  [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] (EE-ManagedExecutorService-default-Thread-1) [Consumer clientId=consumer-1, groupId=pinfo-microservices] Setting newly assigned partitions [instruments-0]

In another console, check the running containers

docker ps

 
CONTAINER ID        IMAGE                             COMMAND                  CREATED              STATUS              PORTS                                               NAMES
f7af748fe9ae        unige/valuation-service:latest    "/bin/sh -c 'java -D…"   3 minutes ago       Up 3 minutes        0.0.0.0:12080->8080/tcp                             valuation-service
aea66ab35500        unige/instrument-service:latest   "/bin/sh -c 'java -D…"   3 minutes ago       Up 3 minutes        0.0.0.0:11080->8080/tcp                             instrument-service
2df3d6a8d6aa        confluentinc/cp-kafka:5.1.0       "/etc/confluent/dock…"   33 hours ago         Up 4 minutes        0.0.0.0:9092->9092/tcp                              kafka
f197de9c79fe        zookeeper:3.4.9                   "/docker-entrypoint.…"   33 hours ago         Up 4 minutes        2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp          zookeeper

Now we are ready to test the microservices. Let’s check again that we can query counterparties.

curl -X GET http://localhost:10080/counterparties/724500J4K3Q60O9QLF45

{"lei":"724500J4K3Q60O9QLF45","name":"Ton Smit Onroerend Goed B.V.","legalAddress":{"firstAddressLine":"Van Teylingenweg 126","city":"Kamerik","region":"","country":"NL","postalCode":"3471GG"},"registration":{"registrationAuthorityID":"RA000463","registrationAuthorityEntityID":"52431649","jurisdiction":"NL","legalFormCode":"54M6","category":"","registrationDate":1545264000000,"lastUpdated":1545264000000,"registrationStatus":"ISSUED","nextRenewalDate":1576800000000},"status":"ACTIVE"}

Then let’s get a specific instrument

curl -X GET http://localhost:11080/instrument/1

{"id":1,"brokerLei":"254900LAW6SKNVPBBN21","counterpartyLei":"969500CHL179N00GX059","originalCurrency":"EUR","amountInOriginalCurrency":539926.20,"dealDate":-61630035780000,"valueDate":-61630035780000,"instrumentType":"B","isin":"BE7261065565","quantity":5445,"maturityDate":1577837340000}

Next, we will propagate all the instruments to the message broker for the valuation service to read them and compute the actual valuation.

curl -X POST http://localhost:11080/instrument/propagateAllInstruments

This is the actual result of the valuation of the portfolio.

curl -X GET http://localhost:12080/valuation?currency=USD

{"breakdownByInstrumentType":{"STOCK":376127254.270,"LOAN":317483580.00,"BOND":468433784.120,"DEPOSIT":71056222.00,"WARRANT":4847202.120},"breakdownByCurrency":{"CHF":70073308.00,"SGD":66540948.00,"EUR":913601713.74,"GBP":102726326.00,"USD":85005746.77},"reportingCurrency":"USD","currentValue":1237948042.510,"percentile95":0.0,"percentile99":0.0}

Tip: Congrats, you just got all the microservies and the message broker running.

Bibliography

[1] N. Poulton, (2017) Docker Deep Dive
[2] Turnbull, J. (2014). The Docker Book: Containerization is the new virtualization.
[3] Mauro Vocale, Luigi Fugaro (2018). Hands-On Cloud-Native Microservices with Jakarta EE
[4] Raghuram Bharathan, (2015). Apache Maven Cookbook

Microservice Architecture - Part 2 (SSO, Logging, and all that)

2019-02-17T22:51:20+01:00

In part 1, we discussed how to compile and deploy the microservices. Remember that the microservices themselves are only a part of the microservice architecture. By its very nature, microservice architecture is distributed and that comes with a lot of benefits and some constraints. One of these constraints is that all the non-functional features such as security, logging, testability have to take distribution into account. Think of the microservice architecture as a city, where the microservice are people working in the city. In the city, you also need policemen, firefighters, teachers, healthcare providers to keep it up and running. The higher the number of people working in the private sector (a.k.a., microservices), the higher the need for non-operational people (a.k.a., utilities).

Compiling the UI

This sample microservice architecture does not focus much on the UI. It mainly serves the purpose of showing how to integrate it with the rest of the architecture. We will not dive into details. Sufficient to say, that the example was built with Angular 7.0 and the ngx-admin dashboard. In development, the UI is compiled by npm running on top of nodejs.

$ cd web-ui
$ node --version

10.15.0

$ npm --version

6.5.0

$ npm install

...
audited 31887 packages in 68.922s

$ npm run-script build

69% building modules 1280/1296 modules 16 active ...components\footer\footer.component.scssDEPRECATION WARNING on line 1, column 8 of 
Including .css files with @import is non-standard behaviour which will be removed in future versions of LibSass.
Use a custom importer to maintain this behaviour. Check your implementations documentation on how to create a custom importer.

Date: 2019-02-21T09:13:40.912Z
Hash: e3a111b6560428e93784
Time: 76066ms
chunk {app-pages-pages-module} app-pages-pages-module.js, app-pages-pages-module.js.map (app-pages-pages-module) 3.16 MB  [rendered]
chunk {main} main.js, main.js.map (main) 1.92 MB [initial] [rendered]
chunk {polyfills} polyfills.js, polyfills.js.map (polyfills) 492 kB [initial] [rendered]
chunk {runtime} runtime.js, runtime.js.map (runtime) 8.84 kB [entry] [rendered]
chunk {scripts} scripts.js, scripts.js.map (scripts) 1.32 MB  [rendered]
chunk {styles} styles.js, styles.js.map (styles) 3.99 MB [initial] [rendered]
chunk {vendor} vendor.js, vendor.js.map (vendor) 7.17 MB [initial] [rendered]

Tip: You just compiled the UI based on Angular 7.0

Composing the microservices

At this point, we have all the necessary components. Let’s put everything together by starting the different Docker compositions. The order in which we start the compositions is important as there are dependencies:

docker-compose-microservices.yml starts the Kafka message broker and the microservices. We already tested this in part 1 to prove that all the microservices are available.
docker-compose-log.yml starts an ElasticSearch, LogStash, and Kibana (ELK) suite alongside a Logspout companion container to take care of logs. This aggregates ALL logs from all containers and concentrate them into the ElasticSearch using Logstash. Kibana can then be used to analyze the logs and extract some intelligence, raise alerts and so on.
docker-compose-api-gw.yml starts an api-gateway that routes the calls to the services and handle security by delegating authentication to an identity manager called keyloak. It also serves static content and as TLS termination.

Important: On Linux system you may get the following error message : max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]. If it is the case run the following command in a console : sysctl -w vm.max_map_count=262144

$ cd  docker-compose
$ docker-compose -f docker-compose-microservices.yml up &
$ docker-compose -f docker-compose-log.yml up &
$ docker-compose -f docker-compose-api-gw.yml up &

counterparty-service    | 2019-03-12 20:06:27,203 INFO  [stdout] (default task-1)         counterpar0_.registrationStatus as registr16_0_,
counterparty-service    | 2019-03-12 20:06:27,203 INFO  [stdout] (default task-1)         counterpar0_.status as status17_0_
counterparty-service    | 2019-03-12 20:06:27,203 INFO  [stdout] (default task-1)     from
counterparty-service    | 2019-03-12 20:06:27,205 INFO  [stdout] (default task-1)         Counterparty counterpar0_
kafka                   | [2019-03-12 20:10:15,744] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
kafka                   | [2019-03-12 20:20:15,651] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
kafka                   | [2019-03-12 20:30:15,652] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
kafka                   | [2019-03-12 20:40:15,652] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
kafka                   | [2019-03-12 20:50:15,654] INFO [GroupMetadataManager brokerId=1] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
...
kibana           | {"type":"response","@timestamp":"2019-03-12T20:53:56Z","tags":[],"pid":1,"method":"get","statusCode":302,"req":{"url":"/","method":"get","headers":{"user-agent":"curl/7.29.0","host":"localhost:5601","accept":"*/*"},"remoteAddress":"127.0.0.1","userAgent":"127.0.0.1"},"res":{"statusCode":302,"responseTime":3,"contentLength":9},"message":"GET / 302 3ms - 9.0B"}
kibana           | {"type":"response","@timestamp":"2019-03-12T20:54:01Z","tags":[],"pid":1,"method":"get","statusCode":302,"req":{"url":"/","method":"get","headers":{"user-agent":"curl/7.29.0","host":"localhost:5601","accept":"*/*"},"remoteAddress":"127.0.0.1","userAgent":"127.0.0.1"},"res":{"statusCode":302,"responseTime":8,"contentLength":9},"message":"GET / 302 8ms - 9.0B"}
....
api-gateway         | 192.168.128.15 - - [12/Mar/2019:20:02:36 +0000] "POST /plugins HTTP/1.1" 409 213 "-" "curl/7.29.0"                                                                        
api-gateway         | 2019/03/12 20:02:36 [notice] 41#0: *139 [lua] init.lua:393: insert(): ERROR: duplicate key value violates unique constraint "plugins_cache_key_key"                       " 
api-gateway         | Key (cache_key)=(plugins:oidc::::) already exists., client: 192.168.128.15, server: kong_admin, request: "POST /plugins HTTP/1.1", host: "api-gateway:8001"

If everything went according to plan, you now have a working application ecosystem at https://localhost Point your browser to https://localhost and you’ll get an nice UI.

  
  Angular 7.0 UI to the financial-app

By pointing it to the counterparty microservice at https://localhost/api/v1/counterparty, the API-gateway will detect that you are not authenticated and will redirect you to the SSO platform to enter for credentials. Enter user1/user1

  
  Keycloack SSO login form

Once authenticated you get redirected to the orginal URL you requested (https://localhost/api/v1/counterparty)

  
  JSON result of the counterparty  microservice that returns all counterparties.

Tip: Kudos, you just completed the installation of a complete microservice ecosystem locally on your machine.

Dissecting the docker-composes

As stated previously docker-compose composes several containers together to deliver a solution. For instance, by starting the database first and then whatever service that requires a database. Using docker-compose you set the same parameters, environment variables, volumes that you would when starting a container with the command line.

From a general point of view, a docker-compose yaml file defines a series of services (e.g., database, microservice, web server) and then a series of “shared” services such as volumes, networks and so on.

Let’s take the example of the docker-compose-api-gw.yml file.

First version "2.1" defines the version of the syntax. Then services defines a section with a series of services.
In the below example, the first service is called kong-database and is based on a postgres database version 10 as stated by image: postgres:10. The name of the container (for instance what will appear if you run docker ps) is kong-database. The hostname will also be called kong-database.
What follows, is a section that describes the networks the container is participating into. This is very useful to isolate the containers from one another from a network perspective.
The environment section defines environment variables (similar to -e in the command line).
The healthcheck section defines rules to state whether or not a container is ready for prime time and heathly.
The kong-database example does not expose ports but it could do so by defining a ports section that list the mapping of the ports of the container to the port of the host system. 80:7070 means the port 7070 of the container is mapped to the port 80 (http) of the host system.
Finally, the volumes section maps volumes from the host systems to the directory in the container. This is very useful to save the state of the container (e.g., database files) or to put custom configurations in place.

version: "2.1"

services:

   kong-database:
    image: postgres:10
    container_name: kong-database
    hostname: kong-database
    networks:
     - backend-network
    environment:
      POSTGRES_USER: kong
      POSTGRES_PASSWORD: kong
      POSTGRES_DB: kongdb
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "kong", "-d", "kongdb"]
      interval: 30s
      timeout: 30s
      retries: 3
    volumes:
      - pgdata-kong:/var/lib/postgresql/data
...

After that very quick introduction to docker-compose let’s have a look at the services delivered by the three docker-compose files of the demo:

`docker-compose-log.yml`: Providing a logging infrastructure

Microservice architecture are distributed by nature and therefore cross-cutting concerns such as logging must take that aspect into account and aggregate the logs of the different containers. Without that it would be difficult to follow a user request that goes across many services to deliver the final value.

To implement it, we rely on the logspout log router. Logspout primarly captures all logs of all the running containers and route them to a log concentrator. Logspout in itself does not do anything with the logs, it just routes them to something. In our case, that something is Logstash.

Logstash is part of the ELK stack and is a pipeline that concentrates, aggregates, filters and stashes them in a database, usually elasticsearch. Kibana depends on Elasticsearch and gets its configuration from a volumes shared from the host (./elk-pipeline/). For more details about the Logstash configuration, please refer to ./elk-pipeline/logstash.conf.

Elastic Search stores, indexes and searches large amount of data. Like Logstash, it is distributed in nature. Elasticsearch is starting first in docker-compose-log.yml because other services such as Logstash and Kibana depends on it. Elasticsearch maps a host volume (esdata1) to its own data directory (/usr/share/elasticsearch/data). Thanks to that mapping, data are not lost when the container is stopped or if it crashes.

The final part of the puzzle is Kibana which visualizes the data stored in Elasticsearch to do business intelligence on the logs. This is very useful to get a clear and real time status of the solution. Kibana depends on Elasticsearch and exposes its interface to the port 5601 of the host.

`docker-compose-api-gw.yml` : Prodiving api-gateway services

Microservice architecture are usually composed of a lot of services. Keeping track of these, providing and maintaining a clear API becomes very quickly challenging. Besides, the granularity of microservices often call for a composition to deliver actual value add. Besides, different applications might have different needs. For instance, a mobile app might need a different API than a web app. Furthermore, we often want to secure some services. For instance, using oauth2 protocol connected to an identity provider to offer Single Sign On (SSO) on the services.

In our case, the API gateway is called Kong and it requires a database. The docker-compose-api-gw.yml describes the following services:

kong-database which is a postgress database version 10 that holds the API gateway configuration

The api gateway itself api-gateway that is based on a docker image that we built previously unige/api-gateway when we ran mvn clean install -Ppackage-docker-image at the root of the project. To get more details on how the image has been built, look at the Dockerfile in the api-gateway/src/main/docker directory. The image is based on kong:1.1rc1-centos but it is customized in several ways:

There is an additional plugin to support openid
A customized docker-entrypoint.sh to start Kong as root so that we can attach it to ports 80 and 443 that are privileged.
A customer nginx template to enable serving static content (the UI).
A shell script called config-kong.sh that configures the API-gateway by defining the services and the routes to these services. By the way this file, is ran after the api-gateway is started and labelled as healthy by the container called api-gateway-init. The first line defines a service called counterparty-service that will route the request to the microservice http://counterparty-service:8080/counterparties. The servicer counterparty-service is the host name given by the microservice configuration. The second line creates a route in the API-gateway to the previous service. In that case /api/v1/counterparty, please note that the api-gateway can take care of versioning. Finally, the last line configures the OpenId plugin to provide authentication by telling the plugin to use the api-gateway client of the apigw realm of the keycloak.

#Creates the services.
curl -S -s -i -X POST --url http://api-gateway:8001/services --data "name=counterparty-service" --data "url=http://counterparty-service:8080/counterparties"
...
#Creates the routes
curl -S -s -i -X POST  --url http://api-gateway:8001/services/counterparty-service/routes --data "paths[]=/api/v1/counterparty" 
...
#Enable the Open ID Plugin
curl -S -s -i -X POST  --url http://api-gateway:8001/plugins --data "name=oidc" --data "config.client_id=api-gateway" --data "config.client_secret=798751a9-d274-4335-abf6-80611cd19ba1" --data "config.discovery=https%3A%2F%2Flocalhost%2Fauth%2Frealms%2Fapigw%2F.well-known%2Fopenid-configuration"

A database for Keycloak the SSO software called iam-db

The SSO service iam that is based on Keycloak v4.8.3. Please note that a complete configuration is loaded initially using master.realm.json. This configuration creates the required realm, client and configuration to provide authentication to the api-gateway.

`docker-compose-microservices.yml` : Micro-services and message broker

Finally, the last of the composition are the microservices themselves. As you can see, most of the configuration is not required for the microservices themselves but for the infrastructure around it. All the services belong to the backend-network nertwork.

First, it defines a ZooKeeper that provides distributed configuration management, naming and group services. Zookeeper maintains its state in two shared volumes that are respectively mapped to the ./target/zk-single-kafka-single/zoo1/data and ./target/zk-single-kafka-single/zoo1/datalog directories of the host. Zookeeper is a mandatory component for the message broker.

Second, it defines a Kafka container. Kafka is a robust and fast message broker that excels at exchanging messages in a distributed way. It has a dependency to Zookeeper and exposes its port 9092 to the same port on the host. It also saves its state on a mapped volume on the host.

Then, the counterparty service is a actual microservice (Finally !!!) that exposes its port 8080 to the port 10080 of the host. This microservice is based on Thorntail.

The instrument service is special as it connects to the message broker (Kafka) to send messages that will be read later on by the valuation service.

The other microservices : valuation-service and regulatory-service are more of the same.

Tip: Kudos, you completed the tour of the microservice sample. Next chapter dives into a bit of theory.

My two cents on SVN vs GIT

2016-02-27T11:13:50+01:00

Important: This post is 3 years old and a lot has changed in the mean time. I would now recommend to upgrade to Git. Especially because the Cloud ecosystem is extremely dynamic. That being said, do it in a managed way starting by a low profile project.

Executive summary : Yes, Git is functionally richer and has several quality attributes that are really better than SVN but let’s face it : it comes with an extra layer of complexity. This complexity may (and did in several instances I am aware of) result in time to market issues, tensions during stressful phases, loss of commits, dreadful merges and major impacts on the release management process. Henceforth, the question is :

Are the extra functionnalities worth the investment and the risks?

In my opinion, Git is worth the investment if your devellopers are meant to work off site / off line. If not (e.g., standard banking and financial industries) and it you already have an up and running SVN based software factory then do not bother to migrate to Git for existing project. Try a proof of concept on a new low profile project and do not hesitate to invest in training and to re-engineer you existing release management processes. Once your teams have some experience with it, you can decide to migrate the existing codebase. That’s beeing said, if you have no SCM then start with Git directly.

Introduction

In many companies, SVN is still the SCM of choice (and yes some of them just finished up migrating from CVS). In the open-source world, the situation is very different. Indeed, Git is the de facto winner. As devellopers tend to stay up to date with the latest technologies, they want to migrate to the coolest new thing. I keep having discussions about GIT and whether or not a company or a department should migrate to it. In this blog post, I explain my position and argument on whether or not a company or department should move to Git.

Git vs SVN

There are plenty of very detailled comparison between Git and SVN out there. Among them let me cite the following articles [1] [2] [3].

Here a selection of the most important functionnal and non-functional Git features.

Dencentrilization

Let me start with the obvious, Git is dencentrilized by design. This is great for highly collaborative and disseminated teams. Actually, this is most important feature. Basically, Git has been designed to answer that problematic. This has also nice side effects :

Working offline
Cloning a repository allows to quickly fork a project, make a couple of tests and submit a new version (by mean of pull requests) even if you do not have the commit rights.

This is great but it is probably not the most useful features for teams working in closed business such as banking systems and other plateforms with very few mobility (for security reasons). Moreover, in these cases having conmit rights is definitely not an issue.

Disseminated teams are a very compelling argument to migrate from SVN to Git. Of course, the contraposition is also true

Branches

Branches are feared in SVN and first class citizens in Git. This is definitly a game changer as it allows much better release management procedures.

Performances

In both performance (chekout, checkin) and space GIT is the winner. For instance, using GIT the Mozillia projects gained a factor 30x is terms of space.

Useful features

GIT comes with some nice tools that really improve the productivity of experienced developers. Here is a non-exhaustive list:

git bisect is great for investigating regressions and discover when that a given bug has been introduced.
git stash takes all of the staged changes and stores them away somewhere. This is useful if you want to break apart a number of changes into several commits, or have changes that you don’t want to get rid of (i.e. “git reset”) but also don’t want to commit. git stash puts staged changes onto the stash and git stash pop applies the changes to the current working copy. It operates as a FILO stack (e.g. “First In, Last Out”) stack in the default operation.
Branches are lightweight and merging is easy, and I mean really easy. It’s distributed, basically every repository is a branch. It’s much easier to develop concurrently and collaboratively than with Subversion, in my opinion. It also makes offline development possible.

Some important differences to know

SVN has predicticable and simple version numbers, Git relies on UUID.
GIT tracks contents rather than files

Conclusion

Yes, Git is functionally richer and has several quality attributes that are really better than SVN but let’s face it : it comes with an extra layer of complexity. This complexity may (and did in several instances I am aware of) result in time to market issues, tensions during stressful phases, loss of commits, dreadful merges and major impacts on the release management process. In my opinion, Git is worth the investment if your devellopers are meant to work off site / off line. If not (e.g., standard banking and financial industries) and it you already have an up and running SVN based software factory then do not bother to migrate to Git for existing project. Try a proof of concept on a new low profile project and do not hesitate to invest in training and to re-engineer you existing release management processes. Once your teams have some experience with it, you can decide to migrate the existing codebase. That’s beeing said, if you have no SCM then start with Git directly.

Bibliography

[1] http://stackoverflow.com/questions/871/why-is-git-better-than-subversion

[2] https://git.wiki.kernel.org/index.php/GitSvnComparison

[3] http://www.codeforest.net/git-vs-svn

[4] http://nvie.com/posts/a-successful-git-branching-model/

[5] https://github.com/nvie/gitflow

[6] https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow

A (small) hitchhiker’s guide to Data Warehousing

2016-02-10T14:07:42+01:00

In my current assignment, I had the opportunity to discuss with Data Warehouse (DWH) experts about its integration with the rest of the information system. I noticed that not every stakeholders (included Data Warehouse professionals) use the same vocabulary. During the discussions, people raised words such as “Data Warehouse”, “Data Marts”, ODS, “Data Lake” and so on. Some of the words were used interchangeably which does not help to follow the discussion. As I was not familiar with several of them, I decided to do my homework and to come up with a small glossary to provide a common ground for further discussions.

Disclaimer : I am not an expert in the field, I only tried to come up with a couple of definitions to get a common ground for further discussions. Please experts in the field, help me to improve this!

Business Intelligence (B.I.)

Business intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information. It allows business users to make informed business decisions with real-time data that can put a company ahead of its competitors.

Boris Evelson - Forrester

Business Intelligence Systems usually (but not always) rely on a Data Warehouse to provide information out of raw operational data. The following diagram shows the relationships between the different levels involved in making a decision. Information emerge from consolidated data thus helping the user to improve her knowledge on a given subject. Using this knowledge, she then can make a informed decision.

  
  How B.I. helps Decision making

Business Intelligence Systems have the following properties:

They leverage raw heterogeneous operational data
They enable multi-dimensional information and operations on it
They are driven by the business
They have to be performant and must not interfere with daily operations

Data Warehouse (DWH)

DWHs are central repositories of integrated data from one or more disparate sources. Its purpose is to organize and homogenize data into information. User can then leverage this information into knowledge and therefore make informed decision (see B.I.).

There are three main approaches on how to build a data warehouse.

William Inmon’s approach

According to William Inmon that originally coined the term “Data Warehouse”, a data warehouse has the following properties:

Subject oriented : This implies that data are organized around the business and not around the sources. For instance, several accounting data sources are consolidated into one accounting data warehouse. The purpose of which is to letting information emerge out of data.
Integrated : Coming from different sources, data must be standardized to enable consistency and thus letting information emerge. For instance, customer identification must be normalized across different sources.
Non-volatile : Once in the data warehouse, data must not be altered. Therefore, data is available for future comparison.
Time-variant : Changes made on data over time are tracked. For instance, each and every change to a customer country of residence are tracked.

Inmon’s model follows a top-down approach. First, a complete (enterprise wide) Data Warehouse (DW) is created in third normal form (3NF : avoiding duplication and having referential integrity) and then, if required, datamarts (DMT) are provisioned out of the DW. Datamarts in Inmon’s model are in 3NF from which the OLAP cubes are built. For Inmon, data quality and coherency is paramount and thus the 3NF in the DW and the DMs.

  
  The Imnon Top-Down model

Ralph Kimball’s approach

According to Kimball another prominent actor in this field, “Data Warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.” This definition does not contradict Inmon’s properties. The difference lies in the architecture.

Kimball’s model follows a bottom-up approach. First, some (Datamarts)(#dmt) (DM) emerge directly sourced from OLTP (Online Transaction Processing Systems) systems usually follow the company processes and organization.
The Datamarts are either in 3NF (OLAP cubes are built on top of them) or de-normalized star schemas.

  
  The Kimball Bottum-Up model

The Hybrid model - A typical architecture

Both of these approaches have their pros and cons. Kimball’s model is easy to start with because of the bottom-up approach and hence you can start small and scale-up eventually. Moreover, the ROI is usually better with Kimball’s model. Because of this approach it is difficult to created re-usable data structures and operations (extraction) for different datamarts. Finally, you may end-up with consistency problems. On the other hand, Inmon’s approach is structured and easier to maintain at the cost of being rigid and more expensive.

Real-life DWH implementation often end-up using a hybrid architecure. The following architecture relies on the following biulding blocks:

a staging layer to extract and sanitize data
an ODS to enable “close to the operation” data mining
an 3NF DWH with full history to enable the creation unanticipated datamarts
datamarts that rely either on 3NF or on the star schema for better performances

  
  A typical Hybrid Architecture

Data Mart (DMT)

A datamart is essentially a basic building block of the data warehouse. It is subject-oriented subset of a Data Warehouse. Data mart does not explicitly imply the presence of a multi-dimensional technology such as OLAP and data mart does not explicitly imply the presence of summarized numerical data.

Operational Data Store (ODS)

An operational data store (ODS) is building block of Data warehouse used for immediate reporting with operational data. An ODS contains lightly transformed and lightly integrated operational data with a (rather) short time window. The ODS is usually used when looking for specific events (settling a banking movement or looking for a specific operation). Full history is available in the DWH.

OLAP Cubes

OLAP Cubes are multidimensional arrays of data comming from a relational database. It enables operations such as slicing and dicing (projection), drill down/up, roll-up. A datamart relying on a star schema provides equivalent functionalities. However, in a cube every projection/aggregation are pre-computed (this enables to discover new patterns) whether in a star schema only some projections/aggregations (the one you know are interrested) are pre-computed.

Conclusion

Business Intelligence is a set of practices/methodologies that leverage raw data into decisions. Data warehouses, data marts and cubes are building blocks used to build Busines Intelligence system.

Integration Tests with Docker And Arquillian

2016-01-30T23:38:50+01:00

Last month I decided to add a touch of microservices to the JEE course I teach at the University of Geneva. I ended up with a couple of microservices and as expected I came across the challenge of their integration testing. This post is NOT specifically about microservices. It rathers focuses on the JEE integration testing experience I encountered while building the microservices. A dedicated blog post will follow on my journey through the building of microservices with JEE.

A short description of the architecture

From a technological perspective, I am using JEE 7 on Wildfly with MySql. Therefore, my microservices are wars composed of 1-2 EJBs plus a restful service that exposes the logic. Typically, my microservices are composed of 4-5 classes of max 150 lines of codes each. On my laptop, a microservice deploys in less than 5 seconds. I mainly need to test EJBs and their database calls. I also want to test integration between microservices.

Why Docker?

As it serves as a example for a course, I want it to be super easy to install/re-install 20 times if necessary. Furthermore, I want fast-paced deployment. To that end, Docker is a great tool because I do not have to bother on what laptop/computer the students work (provided they can run Docker Toolbox). Moreover, all the tools/middlewares I am using for the course are already packaged as Docker images.

Building a Docker image for a Wildfly Integration Test Server

As mentioned previously, I propose to use Wildfly + MySql as the runtime environment. However, at test time, I do not want to start both an application server and a database. More important, I want to get a fresh database for each and every test suite. Futhermore, I would like to use on a simpler setup for the application server. For instance, I do prefer to use an in memory H2 database instead of MySql. I also do not want to bother with LDAP/JAAS configuration, clustering, etc…. Of course, datasource name, realm name and more generally all the resources required by the microservices must be present with their production name.

Ones of Wildfly’s great features is its ability to be configured using the command line. The first step is to configure a data source relying on H2 that has the same name as the production one.

# First step : Add the datasource
data-source add --name=StudentsDS --driver-name=h2 --jndi-name=$STUDENTS_DS --connection-url=$H2_URI --user-name=$H2_USER --password=$H2_PWD --use-ccm=false --max-pool-size=25 --blocking-timeout-wait-millis=5000

Then, let’s configure a realm. This is helpful when integration tests rely on principals and do verify the security. The file jee7-demo-realm-users.properties (resp. jee7-demo-realm-roles.properties) defines the users (resp. the roles) of the realm.

# Add a property file realm
/subsystem=security/security-domain=jee7-demo-realm:add(cache-type=default)
/subsystem=security/security-domain=jee7-demo-realm/authentication=classic:add()
/subsystem=security/security-domain=jee7-demo-realm/authentication=classic/login-module=UsersRoles       \
    :add(code=UsersRoles, flag=required,                                                        \
         module-options={"usersProperties"=>"${JBOSS_CUSTOMIZATION}/jee7-demo-realm-users.properties",   \
                         "rolesProperties"=>"${JBOSS_CUSTOMIZATION}/jee7-demo-realm-roles.properties"})

Finally, let’s add an admin user in order to allows Arquilian or an IDE to interact with this application server.

/opt/jboss/wildfly/bin/add-user.sh admin admin

Hereafter the complete configuration config_wildfly.sh.

#!/bin/bash

# Usage: execute.sh [WildFly mode] [configuration file]
#
# The default mode is 'standalone' and default configuration is based on the
# mode. It can be 'standalone.xml' or 'domain.xml'.

JBOSS_HOME=/opt/jboss/wildfly
JBOSS_CUSTOMIZATION=$JBOSS_HOME/customization
JBOSS_STANDALONE_CONFIG=$JBOSS_HOME/standalone/configuration/
JBOSS_CLI=$JBOSS_HOME/bin/jboss-cli.sh
JBOSS_MODE=${1:-"standalone"}
JBOSS_CONFIG=${2:-"$JBOSS_MODE.xml"}

function wait_for_server() {
  until `$JBOSS_CLI -c ":read-attribute(name=server-state)" 2> /dev/null | grep -q running`; do
    sleep 1
  done
}

echo "=> Starting WildFly server"
$JBOSS_HOME/bin/$JBOSS_MODE.sh -b 0.0.0.0 -c $JBOSS_CONFIG &

echo "=> Waiting for the server to boot"
wait_for_server

echo "=> Executing the commands"
export STUDENTS_DS="java:/StudentsDS"
export H2_URI="jdbc:h2:mem:STUDENTS_DB;DB_CLOSE_DELAY=-1"
export H2_USER="sa"
export H2_PWD="sa"

$JBOSS_CLI -c << EOF
batch

echo "Connection URL: " $CONNECTION_URL

# First step : Add the datasource
data-source add --name=StudentsDS --driver-name=h2 --jndi-name=$STUDENTS_DS --connection-url=$H2_URI --user-name=$H2_USER --password=$H2_PWD --use-ccm=false --max-pool-size=25 --blocking-timeout-wait-millis=5000 

# Then configure a realm that relies on property files
/subsystem=security/security-domain=jee7-demo-realm:add(cache-type=default)
/subsystem=security/security-domain=jee7-demo-realm/authentication=classic:add()
/subsystem=security/security-domain=jee7-demo-realm/authentication=classic/login-module=UsersRoles       \
    :add(code=UsersRoles, flag=required,                                                        \
         module-options={"usersProperties"=>"${JBOSS_CUSTOMIZATION}/jee7-demo-realm-users.properties",   \
                         "rolesProperties"=>"${JBOSS_CUSTOMIZATION}/jee7-demo-realm-roles.properties"})

# Execute the batch
run-batch
EOF

# Finally, let's add an admin that can be used by the IDE to deploy the tests
/opt/jboss/wildfly/bin/add-user.sh admin admin 

echo "=> Shutting down WildFly"
if [ "$JBOSS_MODE" = "standalone" ]; then
  $JBOSS_CLI -c ":shutdown"
else
  $JBOSS_CLI -c "/host=*:shutdown"
fi

The next step is to enhance the official Wildfly image called jboss/wildfly:latest with the specific configurations required for integration testing. This following Dockerfile describes how to build the image that we will use for testing. First, it adds the previous configuration files ./config_wildfly.sh ./jee7-demo-realm-roles.properties ./jee7-demo-realm-users.properties to the /opt/jboss/wildfly/customization/ of the image. Then we tell Docker to run the configuration config_wildfly.sh and to do some cleanup. After what, it will record the states as a new image.

FROM jboss/wildfly:latest

ADD ./config_wildfly.sh ./jee7-demo-realm-roles.properties ./jee7-demo-realm-users.properties /opt/jboss/wildfly/customization/

RUN ["/opt/jboss/wildfly/customization/config_wildfly.sh"]
RUN rm -rf  /opt/jboss/wildfly/standalone/configuration/standalone_xml_history
CMD ["/opt/jboss/wildfly/bin/standalone.sh", "--debug", "-b", "0.0.0.0", "-bmanagement", "0.0.0.0"]
EXPOSE 8787

Now we can build an image called jee7-test-wildfly using the following command (in the directory where the Dockerfile lives):

docker build --no-cache -rm -t jee7-test-wildfly .

The following command runs the image we just built, exposes (and map) the port 8080, 9090 and 8787, and mount the local directory /Users/XXXXXXXXXX/tmp/docker-deploy on the image’s /opt/jboss/wildfly/standalone/deployments/ directory. This is of course the directory in which the integration tests are to be deployed..

docker run -d  -p 8080:8080 -p 9990:9990 -p 8787:8787 -v /Users/XXXXXXXXXX/tmp/docker-deploy:/opt/jboss/wildfly/standalone/deployments/:rw jee7-test-wildfly

Now that we do have a container running an application server with in memory database and a simple realm, we can configure the test harness.

How to configure Arquillian

Arquillian is a JEE integration test framework. It allows to test JEE components such as EJBs or web services. The first step it to tell Arquillian where the Wildfly server lives. The following arquillian.xml file states that the integration tests should deploy on a Wildfly container that listens at 192.168.99.100 (which is the Docker container address) on port 9990 (which is the administration port). Furthermore, it declares the admin username and password.

<?xml version="1.0"?>
<arquillian xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://jboss.org/schema/arquillian"
	xsi:schemaLocation="http://jboss.org/schema/arquillian
  http://jboss.org/schema/arquillian/arquillian_1_0.xsd">

	<container qualifier="wildfly" default="true">
		<configuration>
			<property name="managementAddress">192.168.99.100</property>
			<property name="managementPort">9990</property>
			<property name="username">admin</property>
			<property name="password">admin</property>
		</configuration>
		<protocol type="Servlet 3.0">
			<property name="host">192.168.99.100</property>
			<property name="port">8080</property>
		</protocol>
	</container>

	<extension qualifier="jacoco">
		<property name="includes">ch.demo.*</property>
	</extension>
</arquillian>

A simple integration Test

Let us now write a simple integration test. First, we must tell Arquillian the it is in charge of running the test (@RunWith(Arquillian.class)).

@RunWith(Arquillian.class)
public class StudentServiceImplTest {

To test a given component (let’s say an EJB), arquillian expects a well-formed Java component (either a jar or a war). The following test is composend of a package contaning the EJB under test (ch.demo), an empty beans.xml to enable CDI and finally a persistence.xml to enable JPA.

@Deployment
public static JavaArchive create() {
   return ShrinkWrap.create(JavaArchive.class, "integration-test-demo.jar").addPackages(true, "ch.demo")
      .addAsManifestResource(EmptyAsset.INSTANCE, "beans.xml")
      .addAsManifestResource("test-persistence.xml", "persistence.xml");
}

The previous JAR as well as the tests are packaged as a WAR and deployed on the application server declared in the arquillian.xml file. The following test injects the EJB under test implementing the StudentService interface and tests its add method.

@Inject
StudentService service;

@Test
public void shouldAddReturnAllWithTheNewStudent() {
   Integer nbStudentsBeforeTest = service.getNbStudent();
   service.add(new Student("Doe", "Jane", new Date(), new PhoneNumber("+33698075273")));
   Integer nbStudentsAfterTest = service.getAll().size();
   Assert.assertSame(nbStudentsBeforeTest + 1, nbStudentsAfterTest);
   Assert.assertEquals(service.getAll().get(nbStudentsAfterTest - 1).getLastName(), "Doe");
}

Maven, IDE Integration and Coverage

Finally, let me add that it is possible to get coverage data from Arquillian by enabling extensions. In the following, it enables```jacoco``. This produces coverage data that can be used by the Eclipse ECL-EMMA plugin.

<extension qualifier="jacoco">
   <property name="includes">ch.demo.*</property>
</extension>

Conclusion

Docker and Arquillian provide a nice and seamless way for JEE integration testing. Nevertheless, I had a hard time at the beginning because Arquillian error handling in case of undeployable test archive is not very good. In this case, make sure that you package your test correctly (in the method annotated @Deployment). In particular, double check beans.xml, web.xml and the JAR/WAR structure. It really helped me to unzip the deployed test archive to figure out what when wrong in my code.

Cleaning Intermediate Docker Images

2015-12-02T01:56:34+01:00

I recently started to use Docker. It is a great tool that significantly increases developer’s productivity. However, I regularly encounter disk space problems when developing new images. Indeed, I sometimes end up with dangling images and containers. Hereafter, a simple script that cleans up most of them.

#!/bin/bash
docker rmi -f `docker images | grep "^<none>" | cut -c41-52`
docker rmi $(docker images -q -f dangling=true)
docker rm -v $(echo $(docker ps -q --no-trunc) $(docker ps -a -q --no-trunc) | sed 's|\s|\n|g'  | sort | uniq -u)

Communication between Java Enterprise Applications

2014-12-26T22:51:48+01:00

Recently I came across the following problem : How to propagate information from one enterprise application to another in a transparent manner? Transparent meaning without changing the API, that is without adding transversal information to the services’ parameters. The typical use case is to propagate information such as the language, applicative security roles information (not the JAAS role), or the session-id. Moreover, I would like the information to be “request scoped” and to be automatically cleaned up at the end of the request. This is important to avoid memory leak and to enforce isolation for security reasons. Let me add that I currently work with JEE6 on WebSphere 8.5.5.

I did some research among blogs and forums and I found the following solutions:

Passing information in thread-local. This consists in putting information in a Map stored in ThreadLocal. Although this solution is very simple to implement, information is only transmitted inside the current thread. This means that @Asynchronous calls will not get access to the information. Similarly, it is not transmitted through RMI calls that spread on several VMs, as it is usually the case on distributed applications.
Using the JNDI. This solution does not suffer of the above limitations as it is distributed by nature but scoping must be implemented on top of the existing JNDI implementation. I think that it may be possible to implement something like a custom CDI scope but it seems rather complex.
TransactionSynchronizationRegistry (TSR). This alternative is well documented here. This solution works on a JEE application servers. It looks great at the first sight but it does not support any use case in which there is different transactions (or no transaction) involved. This invalidates any information sharing before a transaction as started, when a transaction has been suspended, or when a new transaction is started. Again, I can imagine that it would possible to propagate the content using interceptors but it is too much plumber code to me.
Work Area Service (WAS). This is basically the IBM implementation of solution 2 with scoping. Documentation is clear and it seems easy to implement. Of course, the main drawback is that it is vendor-specific. IBM started a JSR long time ago but it was dropped.

Let us now enumerate several criteria to make a decision about which way to go:

Supports of asynchronous calls : during a request it may be necessary to dispatch the processing among several threads and I would like the shared information to be accessible by any threads involved in this request.
Is (automatically) “request scoped” : if the shared information is not automatically collected at the end the request we may end up with memory leaks. Manual collection is never a good option.
Supports Remote Calls : for a given request, we may end up calling several services (EJBs) on others servers and I would like to have an automatic propagation of the information among the clusters nodes.
Performance : to be useful, the information sharing must be ubiquitous and therefore it must cheap in terms of resources.
Vendor Independence : as far as possible an application must rely on known and portable APIs such as JEE. Locking the application to a specific vendor is, in my opinions, is essentially a problem for maintenance. Migrating from one application server to another only happens rarely.

Solution	Async	Scope	RMI	Vendor Indep.
Thread-Local	X	OK	X	OK
JNDI	OK	X	OK	OK
TSR	X	OK	OK	OK
WAS	OK	OK	OK	X

As you can see there is no silver bullet here. I went for the vendor specific solution. It can be nicely encapsulated to isolate the dependency to vendor specific code. Furthermore, several servers have similar mechanism and it can be therefore adapted. Here is why it was not possible in my setup to use the other alternatives:

The Thread-local solution is not acceptable because it does not support Remote Method Invocation on several virtual machines.
The JNDI solution requires to implement the scoping mechanism. This can be tricky and it is definitely not my area of expertise.
The TransactionSynchronizationRegistry is JEE compliant but it requires huge machinery to support asynchronous calls as well as transaction suspension and re-creation (REQUIRES_NEW, NOT_SUPPORTED, NEVER). Basically, it does not work if there is not one and only one transaction throughout the request.

Bibliography

[1] Adam Bien. HOW TO PASS CONTEXT IN STANDARD WAY - WITHOUT THREADLOCAL. http://www.adam-bien.com/roller/abien/entry/how_to_pass_context_in

[2] Adam Bien. HOW TO PASS CONTEXT BETWEEN LAYERS WITH THREADLOCAL AND EJB 3.(1). http://www.adam-bien.com/roller/abien/entry/how_to_pass_context_with

[3] IBM. Work area partition service. http://www-01.ibm.com/support/knowledgecenter/SSEQTJ_8.0.0/com.ibm.websphere.nd.multiplatform.doc/info/ae/workarea/concepts/cwa_partition.html

Fakes, Stubs, Dummy, Mocks, Doubles and all that…

2014-05-18T10:58:00+02:00

In this post, I look at the different kind of objects used for test purposes. By this, I mean objects that are used to make a test running. This article focuses on component testing, a.k.a. unit testing (I do not like the term unit testing because it is too often misunderstood with the technology behind, e.g., JUnit, testNG). Although there already exists a great number of resources on that subject, it was very difficult to me to understand the differences between the different kinds of test objects. This is partly due to the fact that different authors use different terms for the same object and the same term for different objects [1]. To be as didactic as possible, I also chose to add some blocks of code. Please note, that these blocks are only here for the sake of clarity. This is not the way I would recommend to do stubbing, faking, and mocking. Consider Mockito and PowerMockito for that. These are amazing tools to that purpose. They deserve a post on their own to discuss good practices. This post is in no way an exhaustive state of the art, I only tried to select the terms that, in my opinion, are clear and are consensual enough. To that end I used a number of sources that can be found in the bibliography section.

Here are the main reasons to use different objects during the test phase and in production:

Performances: the actual object contains slow algorithms and heavy calculation that may impair the test performances. A test should always be fast to not discourage regular run and therefore to identify problems as soon as possible. The worst case being the one in which the developer must deploy and run the entire application to test a single use case.
States: sometimes the constellation under test happens rarely. This is for instances that occur with a low probability such as race conditions, network failure, etc..
Non-deterministic: this is the case of components that have interactions with the real-world such as sensors.
The actual object does not exist: for instance, another team is working on it and is not yet ready.
To instrument the actual dependency: for instance to spy the calls of the CUT to one of its dependencies.

Doubles objects

Test double is the generic term that groups all the categories of objects that are used to fulfill one or several of the previous requirements. The term comes has been coined by Gerard Meszaros in [2] In rough terms, test doubles look like the actual object they double. They satisfy, to different extends the original interface and propose a sub-set of the behaviors that is expected by the specification. This helps to isolate the problem and reduce the double implementation to the strict minimum.

There exists different kind of test doubles for different purposes. The have in common that they can be use instead of the actual component without breaking the contract syntactically.

The next figure describes a simple test setup that do not use test doubles. To test the Component Under Test (CUT), the following test setup uses its actual dependencies (another component). This setup phase is trivial as there is nothing to do. The exercise phase calls the CUT with the proper parameters (direct inputs) that in turn calls it dependency (indirect outputs). Another Component returns its result to the CUT (indirect inputs) that uses it to complete the work and then finally returns the overall result (direct outputs). The terms “direct inputs”, “indirect outputs”, and so on come from [2].

  
  Overview of a test setup

Now let us say that “AnotherComponent” is either too complex, not already implemented or has a non-deterministic behavior. In those cases, it is easier to use another implementation of “AnotherComponent” that behaves exactly has expected for a specific scenario.

Hereafter, a simple example to illustrate the rest of the post. The class CUTImpl that realizes the contract CUT implements the component under test. The CUT uses a component that realizes the interface AnotherComponent. For the sake of clarity, the following example injects the dependencies through the constructor. To improve loose coupling, it is possible to rely on dependency injection.

package ch.demo.business.service;

public class CUTImpl implements CUT {
	
	AnotherComponent component;

    public CUTImpl(AnotherComponent c) {
        this.component = c;
    }

	@Override
	public String doBusiness(String param, Integer delta) {
		return component.inc(Integer.valueOf(param)).toString();
	}
}

package ch.demo.business.service;

public interface CUT {
	public String doBusiness(String string, Integer delta);
}

package ch.demo.business.service;

public interface AnotherComponent {
	Integer inc(Integer param);
}

package ch.demo.business.service;

public class AnotherComponentImpl implements AnotherComponent {

    public Integer inc(Integer param) {
        if (param == null) {
            throw new IllegalArgumentException("Param must be not null!");
        } else if (param == Integer.MAX_INTEGER) {
            throw new IllegalStateException("Incrementing MAX_INTEGER will result in overflow!");
        } else {
            return param + 1;
        }
    }
}

The following test uses real implementations of the the different components.

package ch.demo.business.service;

public class CUTTest {

    public void testInc() {
        Assert.assertEquals("inc(3) != 4", 4, new CUTImpl(new AnotherComponentImpl()).inc(3, 1));
    }

}

The question is what if:

AnotherComponentImpl is not ready yet
AnotherComponentImpl depends itself on external services or specific hardware resources
AnotherComponentImpl has non-deterministic behaviors.

Dummy objects

Dummy objects are meant to satisfy compile-time check and runtime execution. Dummies do not take part to the test scenario. Some method signatures of the CUT may require objects as parameters. If neither the test nor the CUT care about these objects, we may choose to pass in a Dummy Object. This can be a null reference, an empty object or a constant. Dummy objects are passed around (to dependencies for instance) but never actually used. Usually they are just used to fill parameter lists. They are meant to replace input/output parameters of the components that the CUT interacts with.

In the current example, the parameter delta of the doBusiness method can be set to null or any Integer value without interfering with the test. Of course, this might be different for another test.

package ch.demo.business.service;

public class CUTTest {

    public void testInc() {
        Assert.assertEquals("inc(3) != 4", 4, new CUTImpl(new AnotherComponentImpl()).inc(3, null));
    }

}

Stub objects

Stub objects provide simple answers to the CUT invocations. It does answer to scenarii that are not foreseen by the current test. In other terms it is a simplified fake object. Stub objects may trigger paths in the CUT that would otherwise not been executed.

The next figure presents a test that relies on a test stub. First, the test case setups a stub object. This object responds to the expected CUT invokation in order to enact a given scenario. This is very useful to check indirect inputs with seldom values.

  
  Test setup that uses a test stub

Back to the example, the following program illustrates how to use a stub to check specific indirect inputs. This stub shows that the CUT relies on the fact that AnotherComponent does not return null, as it would otherwise raise a NullPointerException.

package ch.demo.business.service;

public class AnotherComponentStub implements AnotherComponent {

    public Integer inc(Integer param) {
        return null;
    }

}

package ch.demo.business.service;

public class CUTTest {

    public void testIncWhenAnotherComponentReturnsNull() {
        //Without any modification of the CUT implement, this would raise an exception
        Assert.assertEquals("inc(3) != 4", 4, new CUTImpl(new AnotherComponentStub()).inc(3, 1));
    }

}

Fake objects

Fake objects have working implementations, but they may simplify some behaviors. This makes them not suitable for prime time. The idea is that the object actually displays some real behavior but not everything. While a Fake Object is typically built specifically for testing, it is not used as either a control point or an observation point by the test. The most common reasons for using fake objects is that the real component is not available yet, is too slow or cannot be used during tests because of side effects.

  
  Test setup that uses a fake

The following fake simulates most of the behaviors except for the limits (MAX_INTEGER, null, etc…)

package ch.demo.business.service;

public class AnotherComponentFake implements AnotherComponent {

    public Integer inc(Integer param) {
        return param + 1;
    }

}

As the fake covers many scenarios, it can be used to test the general behavior of the CUT.

package ch.demo.business.service;

public class CUTTest {

    public void testIncWhenAnotherComponentIsFake() {
        CUT cut = new CUTImpl(new AnotherComponentFake());
        Assert.assertEquals("inc(3) != 4", 4, cut.inc(3, 1));
        Assert.assertEquals("inc(123) != 124", 124, cut.inc(123, 1));
    }
}

Mock objects

Partially implements the interface and provides a way to verify that the calls to the mock objects validate the specification. Mock objects are pre-programmed with expectations that form a specification of the calls they are expected to receive. In fact mocks are a certain kind of stub or fake. However, the additional feature mock objects offer on top of acting as simple stubs or fakes is that they provide a flexible way to specify more directly how your function under test should actually operate. In this sense they also act as a kind of recording device: They keep track of which of the mock object’s methods are called, with what kind of parameters, and how many times.

Whenever the assertions are made on the fake object and not the CUT, then it is a mock.

  
  Test setup that uses a mock

The following example uses Mockito to provide easy Mocking. Note that the last assertion Mockito.verify checks whether the mock was called with a given parameters. In other words, we check that the CUT did not filter the input parameter.

package ch.demo.business.service;

public class CUTTest {

	@Mock
	AnotherComponent ac;
	
	@InjectMocks
	CUT cut = new CUTImpl();

    public void testIncWhenAnotherComponentReturnsNull() {
		Mockito.when(ac.inc(Integer.MAX_INTEGER)).thenReturn(Integer.MAX_INTEGER + 1);
		Mockito.when(ac.inc(3)).thenReturn(3);
		Mockito.when(ac.inc(123)).thenReturn(124);
		
        Assert.assertEquals("inc(Integer.MIN_INTEGER) != Integer.MIN_INTEGER + 1", 
                Integer.MIN_INTEGER + 1, cut.inc(Integer.MIN_INTEGER, 1));
        Assert.assertEquals("inc(3) != 4", 4, cut.inc(3, 1));
        Assert.assertEquals("inc(123) != 124", 124, cut.inc(123, 1));
        
        //Verifies that the method inc of AnotherComponent was called with parameter Integer.MAX_INTEGER
        Mockito.verify(ac).inc(Matchers.eq(Integer.MAX_INTEGER));
        //Verifies that the inc method has been called three times.
        Mockito.verify(ac, Mockito.times(3)).inc(anyInt());
    }
}

Test Spy

According to Meszaros [2], a test spy is basically a recorder that is able to save the interactions between the CUT and the spy for later verifications.

  
  Test setup that uses a test spy

On the other hand, Mockito considers that a spy is an real implementation in which you change only some specific behaviors. Instead of specifying every behavior one by one, you take an existing object that does most of it and you only change very specific behaviors.

Conclusion

To sum up:

A dummy is just there to enable compilation and is not supposed to be part of the test.
A fake is a partial implementation that can be used either in a component test or in a deployed setting.
A mock is a partial implementation that enables asserting on the component interactions.
A spy is either a recorder for later use or a proxy on a real implementation that is used to override some specific behaviors.

Bibliography

[1] Martin Fowler Mock aren’t stubs
[2] Meszaros, Gerard (2007). xUnit Test Patterns: Refactoring Test Code. Addison-Wesley. ISBN 978-0-13-149505-0.
[3] Friends you can depend on
[4] Wikipedia Mock Object
[5] Wikipedia Test Doubles
[6] Wikipedia Fakes
[7] Wikipedia Stubs

Testing levels

2014-04-18T10:52:00+02:00

Introduction

In this post, I would like to discuss number of definitions around the testing activity. Having these definitions in mind helps to organize this crucial activity. In a previous post, I discussed the difference between verification and validation. If the difference is not clear to you, please have a look at it prior reading this post.

Let me start with the definition of what is testing. Software testing helps to measure the quality of a software in terms of defects. It is crucial to understand that “testing shows the presence, not the absence of bugs” [1].

This comes from the fact that exhaustive testing is not possible due to a phenomena called the state space explosion [2]. The idea is that doing exhaustive testing would require a structure in memory that remembers all the tested states of the system. A state of the system being the concatenation of its variables. For instance, let us take a program that has two variables:

an integer (4 bytes = 32 bits)
an array of ASCII characters of length 10 (10 bytes = 80 bits) The number of states to explore is 2^112 ~ 5x10^33 states (remember that the number of atoms in the observable universe is 10^80) and the required amount of memory would be 7.2x10^22 Terabytes. Although many optimizations can be brought to a brute force approach [1,2], the problem remains huge. Therefore, exhaustive testing is not an option.

Another important point about defects is to understand from where they originate. A (software) defect originates in a human mistake (e.g., a misunderstanding) that produces a fault (i.e., a defect, a bug). Under certain circumstances, the faulty code will end up doing something unexpected with respect to the user requirements. This is called a failure.

To sum up, there is the following causality chain : Mistake –> Fault –> Failure.

This demonstrates that testing is not only a matter of detecting the failure but that it can be done earlier. Of course the earlier the defect is detected the cheaper is it to address it. For instance, informing the developer about the business may avoid a mistake. Using automated code checker may detect some faults.

Testing dimensions

Testing can be characterized in terms of dimensions. These dimensions help to categorized the test types.

What : This dimension describes what are the objectives of the tests. Test objectives vary from one approach to another. Usually the objectives are the verification or the validation of functionnal (e.g., portfolio performance) and non-functionnal (e.g., performance, security) requirements.
How : This defines how the test objective is achieved. For instance, tests can be either static or dynamic, in isolation or in integration, or knowing the implementation.
When : Test can be executed at different moment of the development process. For instance, component testing can be done very early in the development process, while user acceptance test can only be performed when the software is ready for prime time.
Who : Different kind of tests are run by different people (e.g., developer, testers , end-users, …) For instance, component testing can be done by programmers, while user acceptance test are performed by end-users.

Testing level

Testing levels have been addressed in a number of publications, blog posts and talks [3], [4], [5]. Testing levels describe test types by their quantity and when they occur in the software lifecycle. At the base, tests are done early in the development and extensively. The higher the level, the later the test occurs in the lifecycle. Moreover, while lower levels are usually done the the software supplier, higher levels tend to be performed by the customer.

  
  Testing Levels

Static testing

This sort of testing do not require to execute the code. Tools crawl the code and look for patterns that can lead to fault. Example of tools are Findbugs, PMD. This kind of testing is especially useful to detect complex mistakes involving thread-safety or typing.

Unit testing

Test objects are isolated components (classes, packages, programs, …) To promote isolation, test objects such as stubs, fakes or mocks can be used. for more information see Fakes, Stubs, Dummies, Mocks and all that. These tests happen during development and discovered bugs are fixed right away. Therefore, the management overhead is minimal. Both verification of functional and non.functional requirements can be addressed.

Integration testing

Integration testing (a.k.a. assembly testing) verifies the integration between several components. At this level, some components can still be faked to ease deployment and isolation. Both verification of functional and non.functional requirements can be addressed.

API testing

This is the first test level that addresses validation instead of verification. It tests the software using its contracts (API). This is pure blackbox testing usually by using webservices. Tools such as SoapUI are very good at testing the software API and semantics.

GUI testing

This level acts on the graphical user interface. Example of tools are Selenium

System testing

This test level aims the system as a whole with every internal and external components. Both verification of functional and non.functional requirements can be addressed.

Acceptance testing

Both verification of functional and non.functional requirements can be addressed.

Bibliography

[1 ]Dijkstra (1969) J.N. Buxton and B. Randell, eds, Software Engineering Techniques, April 1970, p. 16. Report on a conference sponsored by the NATO Science Committee, Rome, Italy, 27–31 October 1969. Possibly the earliest documented use of the famous quote.

[2] Antti Valmari. The state explosion problem. In Wolfgang Reisig and Grzegorz Rozenberg, editors, Lectures on Petri Nets I: Basic Models, volume 1491 of Lecture Notes in Computer Science, pages 429D528. Springer, 1998.

[3] Martin Fowler. TestPyramid. http://martinfowler.com/bliki/TestPyramid.html

[4] Alister Scott. Yet another software testing pyramid. http://watirmelon.com/2011/06/10/yet-another-software-testing-pyramid/ [5] Alister Scott. Introducing the software testing ice-cream cone (anti-pattern). http://watirmelon.com/2012/01/31/introducing-the-software-testing-ice-cream-cone/

Steve Hostettler

Microservice Architecture - Part 3 (Diving into microservices)

Introduction

Why (yet) another architecture paradigm

Definition(s)

Technologies

Cloud native technologies

Message broker

Additional properties of microservices

Single Database per service

Low cross-service reuse

One Domain per service

Service Granularity

Architecture patterns

Command Query Responsibility Segregation (CQRS)

Event sourcing & Message Bus

API Composition / API Gateway

Pros and Cons

Pros

Scalability,

Failover, Fault Tolerance, High-Availability

Time to market, Adaptability

Team Independence

Technology Adaptability

Reusability

Cons

Increased resource consumption

Operational Overhead / Deployment complexity

Cross-Cutting concerns

Architecture Complexity : Distributed system

Eventual consistency

Closing Thoughts

Bibliography

Microservice Architecture - Part 1 (A running microservice architecture)

Introduction

Pre-requisites

Getting the backend components to run

Bibliography

Microservice Architecture - Part 2 (SSO, Logging, and all that)

Compiling the UI

Composing the microservices

Dissecting the docker-composes

docker-compose-log.yml: Providing a logging infrastructure

docker-compose-api-gw.yml : Prodiving api-gateway services

docker-compose-microservices.yml : Micro-services and message broker

My two cents on SVN vs GIT

Introduction

Git vs SVN

Dencentrilization

Branches

Performances

Useful features

Some important differences to know

Conclusion

Bibliography

A (small) hitchhiker’s guide to Data Warehousing

Business Intelligence (B.I.)

Data Warehouse (DWH)

William Inmon’s approach

Ralph Kimball’s approach

The Hybrid model - A typical architecture

Data Mart (DMT)

Operational Data Store (ODS)

OLAP Cubes

Conclusion

Integration Tests with Docker And Arquillian

A short description of the architecture

Why Docker?

Building a Docker image for a Wildfly Integration Test Server

How to configure Arquillian

A simple integration Test

Maven, IDE Integration and Coverage

Conclusion

Cleaning Intermediate Docker Images

Communication between Java Enterprise Applications

Bibliography

Fakes, Stubs, Dummy, Mocks, Doubles and all that…

Doubles objects

Dummy objects

Stub objects

`docker-compose-log.yml`: Providing a logging infrastructure

`docker-compose-api-gw.yml` : Prodiving api-gateway services

`docker-compose-microservices.yml` : Micro-services and message broker