Thrift Load Balancing

Thrift is RPC framework protocol that can be used to transport data in faster manner over TCP.  Most of the time thrift is used to server to server data transfer…  As example to publish real time logs in to log servers.  I have worked with thrift to use it as PEP-PDP communication mechanism.  If you like, you can find more details on PEP-PDP communication from here.  PEP is a client side component where PDP is a server side.  In actual production development, there can be many PDP nodes and a Load balancer would distribute the load across these nodes. Therefore we may need to use a load balanacer that can be used for thrift load balancing.  That is fine…  But issue is that.  we need to maintain a session between PEP and PDP.  How we can achieve this?  The best option is to replicate thrift session across the each node… But some server does not support this as Session object caches heavy object (that can not be serialized) and it can not be distributed across nodes.  Then load balancer would need to use the sticky session..  But this would not be an option for some cases,  as i have describe in here.  Normally Hardware load balancers are used for thrift and Most of the time, they could not handle the Sticky session.   However,  still there can be two options

1. Sending Authentication credentials in each request.

This would works as expected,  But there are three main draw backs with this approach

  • Every time credentials are needed to be validated in the server side. This would cause for performance bottlenecks.
  • There can be some policies within your user store that it would not allow users to authenticate every time…
  • It is not good to pass the credentials always in the communication channel.  But this would be fine as thrift is not normally used for communication over Internet, but it is mostly used in a LAN step, basically for server to server communication.  Also SSL transport can be used easily to secure the credentials.

But we can resolve above two things by caching the user credentials in server side.  If client sends a 1st request, Server can create the thrift session and server can keep the combination of username and the hash password as the key for the created thrift session. Basically combination of username and the hash password would be the thrift session identifier.  When next request is received. Server does not want to verify the credential with user store as thrift session has been created.  Therefore credentials can be validated with thrift session identifier.

2. Client side load balancing

You need to implement all things that is done by the load balancer from scratch.  Therefore, you may waste your time.  Also, client must know about all internal server urls. (not only the Load balancer url). But,  here, you have lot of flexibility.