Overengineering is a word that a software engineer will hear about from time to time through his/her whole career. I recalled a design case that I encountered a few months back that perfectly explain what is overengineering in practice.

The Definition of Overengineering

On Wikipedia, Overengineering is defined as “the act of designing a product or providing a solution to a problem in an elaborate or complicated manner, where a simpler solution can be demonstrated to exist with the same efficiency and effectiveness as that of the original design.” I would like to simplify it as: choose a more complicated solution to resolve a problem when there exist simpler ones.

How to avoid overengineering

The tricky thing is, in practice, normally it is hard to identify overengineering in the system that is designed by yourself.

From my own personal experience, except harness yourself with rich knowledge and hands on experiences, the best option is to leverage collective wisdom, i.e. consult as many people as you can.

Case Study

A few months ago, my team had a design decision to make for a web system that is under developing. The scenario can be simplified as:

  • A group of users, on peak time, the QPS was estimated as around 5K, would request and compete for a limited number of resources (the number of the resources would be less than 500) in a pool.
  • Users would request the resources via RestAPI call. The users’ requests would be served in First Come First Serve style, non pre-emptive.
  • The individual resource itself has a rate limiting, if it receives more than 20 requests per minute, then it will return an error and become unavailable for a few minutes.
  • VIP users could be assigned a resource exclusively and still under the “20 requests per minute” rule.
  • According to internal tests (from the ones who were not from the dev team), the possibility that requests from the user might take more than 3 seconds to finish was high (roughly more then 98%).

The design is targeted for resolving two issues of the scenarios:

Resources competition.

How should the system handle the case that multiple users competing for the limited number of resources?

A single resource received more then 20 requests per minute.

How should the system handle the situation that a resource receives more then 20 requests per minute from a VIP user?

One of my colleague instantly proposed using message queues as the core of the solution. To be specific:

for “Resources Competition”

  • All the user requests should be put into a message queue and be pulled out in FIFO style.
  • A worker would pick a request and assign a resource once there is any available ones in the pool.

for “More than 20 Requests issue”

  • Assign a message queue for each resource, store the requests sent from the user into a message queue, a worker pull requests from the queue in FIFO style.
  • Only let the requests be pulled out from the queue under the rate of 20 per minute.

The solution is shown below.

Trulli
Fig.1 - MQ Solution

Add graph

Everything seems perfectly “reasonable” right now. However, in the evening of the same day, I gave the problem a second thought, then I realised that message queues might be an overengineering solution.

Resources Competition

From User Experiences Perspective
  • Resources Competition. On the one hand, let us assume if a message queue is not used here, then once all the resources are assigned, a user will receive an error for the resource request and the user will be informed by a message that he/she should retry after a few minutes. On the other hand, if a message queue is adopted here and all the resources are assigned, then a user’s request will store in the queue and the user will need to wait for the response for unforeseeable time, which does not improve user experiences compare to the first case.
From System Performance Perspective

Well, it is obvious that introducing message queue will not improve system performance.

Therefore, for “Resources Competition”, adopting a message queue is an overengineering option.

More than 20 Requests issue

I was thinking about forcing a user to wait for 3 seconds as an alternative option, i.e. each time the user consumes the resource and takes less than 3 seconds (60 seconds / 20 = 3), then force the user to wait until 3 seconds passed (3 seconds count from taking the ownership of the resource to release it), it ensures that the user will not trigger the 20 requests per minutes error.

From User Experiences Perspective

Consider the fact that more then 98% of chance that a user’s request will take more than 3 seconds to complete, it means that forcing a user to wait for 3 seconds passed if a request takes less than 3 seconds will not significantly impact user experiences.

From System Performance Perspective

One resource per message queue add heavy burden on the backend system while it does not improve the system performance in any means compare to the forcing wait method.

Therefore, for “More than 20 Requests issue”, using message queue solution is an overengineering option too.

With the arguments and conclusions in mind, the other day, I successfully convince the team to avoid adopting message queue which relief the backend dev team :).

Comment and share

Recently, I helped some small business (less then 20 people) integrating payment module into their systems. I consider that it worths sharing the experiences, as this also benefit the cases include: selling code on Github, selling online courses on personal websites etc.

The payment platform manages everything for you

There are some payment platforms that manage everything for you. You do not need to bother to design the frontend UIs of “placing order” page (choosing the products listed on the page) and “paying” page (filling credit card information and actually transferring the money), not to mention the backend logics behind the UIs. The platforms do all these things for you. Let us take lemon squeezy as an example(Well, I swear I did not receive ads fee from lemon squeezy, but maybe they should do so :D).

For integrating lemon squeezy into a system, all a software engineer needs to do is:

  • Set a store page on lemon squeezy, fill the page with information includes: product information and payment information, as shown below. The customers should complete all the steps regarding buying products on the store page: select products, filling payment information and transferring money.

    Trulli
    Fig.1 - An example of lemon squeezy store page
  • Add a link to the frontend page where you would like to redirect the customers to the store page of lemon squeezy

  • Add a page for customers to activate the keys recieved from lemon squeezy. After a customer complete payment on the store page, lemon squeezy will return an activation key to the customer, the customer should use the key to claim the ownership of the product on the website. On the backend of the website, when activating the key, the software engineer needs to do two things: 1) validate the key with lemon squeezy; 2) activate the key with lemon squeezy.

With such payment platform, the software engineer only need to save the activation key and the order id returned from lemon squeezy. lemon squeezy provide a orders page for the store owners to check order information, as shown below.

Trulli
Fig.1 - The orders page provided by **lemon squeezy**

Cons:

There is no perfect solution in reality. The cons of using lemon squeezy as I observed from my own experiences are:

  • User experience. Rediect to a totally seperate payment page affect user experiences regarding online purchasing in a bad way. In some countries, the page loading speed may need up to 30 seconds.

  • Trust issue. As the domain name for the lemon squeezy store page is different from the source site, it may create trust issue for the potential customers

  • Less control. High managed platform means little spaces for customization. For instance, the layout of the store page UIs are not in the control.

Type 2: Manage everything on your own

For this type of platform, a typical work flow is:

1
select products --> select payment method --> place the order to the platform via RestAPI --> the platform returns a qrcode link for the customer to pay --> nortify the website the payment has completed

in which you have everything in control, i.e. design UIs of products page and payment page, handling all the backend logics.

Pros and Cons

The cons of adopting the type of platform is that it may takes time to build it as everything is under your control. And the pros are clearly the opposite side of the cons of type 1 platform.

Comment and share

Recently, I have been busy working on a project start from sctrach – from business idea to a software product running online. I literally complete the design and most of coding job myself (well, I also deeply involded in devops as well). The project is about to go online now. I would like to spend a little bit of time to discuss about the system design of the project.

Business Idea

As it is a chit-chat about system design in real case, I will briefly introduce the business without mentioning too many details.

The system provides a kind of service for registered users that is based on purchased usage quota, i.e., everytime a user consume the service, the number of service usage limit belongs to the user will be reduced by ONE, until it reaches to ZERO. Then the user needs to further purchase the service.

In essence, the system needs:

  • a user management service
  • a mechanism that is able to change the user data in ‘real time’

System Design

Trulli
System Design for the Business

User Management Service

A typical user management module, I chose mongodb for the following 3 reasons:

  • the user schema design was not fixed at the beginning, to avoid data schema changing troubles with relational DB, I went with NoSQL

  • The better sharding and scalability provied by mongodb as it is json document based.

  • To change the user data in ‘real-time’, I consider caching user data in memory for fast I/O, and I only want to cache partial user data, not all of them, therefore, NoSQL is a better option.

Change User Data in ‘Real-time’

As I mentioned, everytime a user consumes the service, the service usage quota will minus ONE. Since the service could be used by multiple users at the same time, the system must not spend too much time on I/O regading changing the user data, hence, I consider use in-memory cache here in stead of updating data directly in mongodb. That is the place where Redis will play.

As shown in the figure, when a user login, the user data will be loaded into Redis, and everytime the user consumes the service, the backend side will update the data in he Redis, and only write back to mongoDB when the user logout.

Sync user data between frontend and backend

At the frontend side, the user may need to see the number of service usage in ‘real-time’. There are two choices:

  • Frontend always keey the data sync with backend, means that everytime the service is used, the frontend will invoke REST API of backend and wait for returned result.
  • Frontend and backend use different data set. To be more specific, the frontend caches the user data in react-redux (only work with data from react-redux), and everytime the user consumes the service, on the frontend side, it change the number in react-redux, at the same time, invoke REST API to update the number in redis on the backend side.

For the 1st choice, the frontend side will always show the correct data but it sacrifices time. For the 2nd one, the frontend may show different data from backend (if something wrong happens on the backend side regarding updating data in redis), but the frontend side does not need to wait for the REST API call result.

I went for the 2nd choice for speed.

Locking the Serivce

The service will be used by multiple users at the same time and it is not sharable. Therefore, I need a distributed lock here.

Since I have introduced redis for caching, I used redis redlock for distributed locking.

These are the design decisions I have made during the project. System design is always about trade-offs: space, time, cost. The most import one: do not over design, the priority is to meet the business requirements not to create a technically perfect product.

Comment and share

  • page 1 of 1
Author's picture

Jingjie Jiang


Find a place I love the most