There are many reasons for this, but the main one is that what we are building (a telecommunications call routing and billing platform) is big and complicated enough without the additional cognitive load and risk associated with newer, less mature technologies.
While we do think it’s crucial to stay abreast of the cutting edge and the direction of the software development industry as a whole, we rarely see incorporating any relatively young technology into our solution as a good idea.
So, starting from the top….
TLDR: It's React and Redux
Right out of the gate we present our biggest contradiction to the introductory paragraph – an instance where we chose to take a less mature path.
It was a calculated decision – we went in knowing that choosing the new hotness could end up biting us – but we could never have foreseen the sheer extent of it. The only consolation was that a significant part of the industry was experiencing the same pain along with us. This post by Jose Aguinaga sums it up beautifully.
Thankfully once React came along the insanity ended, because we had found something worth settling on. Gradually we started making the switch, one component at a time to React and Redux. We have separate portals for our clients’ telco staff versus their customers, and often they need to present the same functionality. The componentised nature of React makes this possible whereby we can extract common functionality into NPM modules which we upload independently to our Nexus NPM repo for reuse.
API / Middle Tier
Here’s where we get back to our typical, more traditional roots.
We use the Java EE platform deployed onto JBoss EAP instances configured in a cluster. Java itself, while a much maligned language, is notoriously easy to grok. In the beginning we briefly considered going with a more “modern” language like Scala, but were put off by its tendency to provide a dizzying number of ways to solve the same problem, which we figured could make it difficult to share one codebase between many developers, who may each adopt a different style even within the bounds of the Scala language.
We do recognise that this would be seen as a useful feature on a project with different goals and priorities to ours, and it’s not intended as a criticism of Scala generally. Also Scala, along with other options like Ceylon would have (at the time) required us to discard our many years of experience with Java EE, so ultimately it was Java EE that won the day, and Java was the good-enough language that went along with it. Those were the days of Java 7, and now that Java 8 has arrived there is less to dislike about the Java language, particularly with the new functional constructs it introduces.
We publish our API endpoints as JSON over HTTP using Jackson to map Java POJOs to JSON. We have some standard JAX-B annotations, but ultimately had to break out Jackson-specific ones to achieve the level of control we wanted over our JSON output. We use JAX-RS to map Java functions to HTTP endpoints, most of which utilise @Local @Stateless EJBs for container managed transactions and security. The web tier authenticates with the API using a combination of basic authentication over HTTPS and cookie-based single-sign-on (SSO, provided by JBoss EAP). Authorisation is handled by a combination of container managed security roles, and custom CDI @Interceptors that wrap each HTTP request and implement intelligent, domain-specific checks depending on the type of payload as it passes into or out of the system.
The overall system was broken into 15 modules during the rigorous design stage. Each module consists of 6 related Maven projects: domain, repository API, repository implementation, service API, service implementation and client. Domain houses the domain objects with their JAX-B, Jackson and JPA/Hibernate annotations. The repository implementation uses JPA/Hibernate with a sprinkling of DeltaSpike for simplified method-name-based and criteria-style queries. The separation of “-api” vs “-impl” allows us to achieve encapsulation of the repository implementation within its module, and of the module implementations between modules. For example it provides the option of replacing the repository with some completely different implementation (eg: switching to a NoSQL database for performance, even if just for one or two specific entities) without the corresponding service code needing to change.
Inter-service communication happens either via RPC using @Remote EJB calls or asynchronously via JMS. In most cases the publishing of a message to a JMS queue or topic is abstracted using the CDI event model. For example, when a Customer record is created by the customer module, it fires a @Created Customer CDI event, which gets @Observed by a method that in turn converts the Customer into JSON and publishes it to a “customer created” JMS topic for other modules to consume.
This pattern of publishing “create” and “update” evens is a key aspect of the distributed architecture of the system, in which each module has its own database. Many of the modules have the concept of a Customer, but each requires a different subset of a customer’s data. For example, the billing module requires a customer’s ABN to be included on their invoice, but the rating module does not need to know it. For this reason, each module will uniquely consume all @Created and @Updated Customer events (via a JMS topic) and store just the information that they require. This decreases runtime coupling between modules, since the billing module can now generate an invoice even if the customers module is offline.
We use 4 separate MySQL database instances, each containing one or more database schemas.
Each module has its own schema and there is no cross schema access between modules (ie: inter-module access happens via the API, never the database). Most of the schemas are housed in the “application” database instance. The second and third instances house our SIP server’s database for 2 independent SIP server deployments (more on that below).
The fourth instance is the “archive” database which is intentionally deployed on a dedicated instance. This database is designed to hold all historical records of the more voluminous entities, such as call records. The intent is to identify and isolate the data that we expect to be the most difficult to manage. We expect to need to perform maintenance on that database more than the others, and also to possibly need to convert it to a more scalable technology such as Redshift at some point.
To facilitate those requirements, all data published to the archive database is done so via a JMS queue with a long, slow retry policy which allows us to have extended downtime (days) of the archive server without losing any data. Data that is pushed to the archive database is only deleted from the application database once it becomes more than 2 months old, which provides a level of degraded functionality when the archive database is down.
We use an off the shelf SIP server to handle our call traffic.
Multiple instances of the SIP/media server are deployed across 2 separate environments for scaling and redundancy, each fronted by a pair of OpenSIPs gateways. The SIP servers talk to a dedicated MySQL database instance per redundant environment. The SIP server will interrogate our application’s API for each incoming call to determine the routing configuration. If our API is not available, it will fall back to a static configuration stored in one of two possible cloud storage locations (again for redundancy and graceful degradation).
Testing and Deployment
Provisioning of infrastructure and platforms, plus application deployment is mostly automated using Ansible. All commits pushed result in a new build on Bamboo, whereby over 1000 functional tests are executed prior to releasing a new version of each module to our Nexus repository. The new version is then automatically deployed to our staging environment for manual testing and verification.
New features are verified in batches and the corresponding release is “approved” for subsequent release into production, which is a couple of button clicks away. Our staging environment exactly matches our production environment, apart from VM and database instance sizes.
We use a clustered JBoss EAP configuration, and application deployment involves completely stopping, reconfiguring, starting and deploying to each instance in serial. If any instance fails, the process halts, ensuring at least one instance remains running at any given time even in the face of deployment failure.
Most of our monitoring needs are provided by New Relic (application performance, infrastructure, uptime/availability, alerting, dashboards). Graylog is used to aggregate our logs onto a single server where they can be easily searched. Finally we use VoipMonitor for deep insight into the quality and performance of our VoIP traffic, plus some reporting and alerting in case of anomalies.