The internet still needs Multicast
I've been having some interesting discussions with some of my colleagues recently about the challenges of HTTP and dynamic content on web pages, by dynamic content I mean things like market data or cricket scores that can constantly change. What we would ideally like to do is to push data to browsers from the server every time something changes, doing this over HTTP requires some degree of compromise:
a) Polling - the browser can poll the web server on a periodic basis to get a newer copy of a piece of data. The data is only as up to date as the last poll, so if we poll once every 5 seconds we may end up using data 5 seconds old. The load on the server also goes up in proportion to the numbers of browsers polling for data, as opposed to how often the data actually changes. This isn't ideal from a scalability point of view.
b) Persistent Connection - the browser holds open a connection to the server, the server sends some data down that connection when an event occurs, in this case when a piece of data changes. The problem here is the number of concurrent open connections we have to support on the server side, if we have 30,000 browsers watching a piece of data then we need 30,000 open connections. This problem can show itself in IIS or apache running out of sockets and browsers getting connection time outs when trying to access the site, again the load on the server is in proportion to the number of clients and not the frequency of data change.
I used to work on market data systems, specifically equity market data where we could have 1000s of updates per second, having each traders PC open a point to point connection to the server would have required very expensive and fast servers, so instead we used broadcast network protocols: UDP or IP Multicast. The load on the server then went up in proportion to the number of changes to the data and not the number of clients, far more scalable and economical.
This sounds like a good solution to sending changes to browsers, while UDP only works well on a LAN the IP Multicast protocol is designed to work accross multiple LAN segments in a scalable fashion. I'll not bore you all with details of how the protocol works, but what is interesting is that most routers used within the internet can support this protocol already.
Suppose I had 100 subscribers for updates on the same LAN segment, each needing updates from a remote server over a WAN link. The IP Multicast protocol is designed to send only one packet to the router over the WAN link, the router than fans out that update. In a traditional connection oriented setup each of those subscribers would have made their own point-to-point connection to the remote server, the same peice of data would have been sent over the link to the router 100 times!
Multicast protocols are actually supported by the common multimedia players as well, the idea being that if a few thousand people on the same ISP are watching the same live event then only one stream needs to be sent to the ISP which can then fan that stream out in an efficient manner. The reality seems to be that this doesn't happen, all the people watching the event make point to point connections, the load on the internet and the server is thousands of times larger than it needs to be.
So why isn't multicast used? One reason suggested is that the ISP's would need to agree the required peering relationships, they'd have to agree to allow multicast to pass between each others networks. Another challenge is how to handle the millions of different multicast groups that would be required to support the number of streaming sources on the internet. So for now we are stuck with inefficient connection oriented protocols for data flows that are really connectionless and broadcast in nature, as the amount of streaming dynamic data on the internet increases is it time to look again at multicast?
a) Polling - the browser can poll the web server on a periodic basis to get a newer copy of a piece of data. The data is only as up to date as the last poll, so if we poll once every 5 seconds we may end up using data 5 seconds old. The load on the server also goes up in proportion to the numbers of browsers polling for data, as opposed to how often the data actually changes. This isn't ideal from a scalability point of view.
b) Persistent Connection - the browser holds open a connection to the server, the server sends some data down that connection when an event occurs, in this case when a piece of data changes. The problem here is the number of concurrent open connections we have to support on the server side, if we have 30,000 browsers watching a piece of data then we need 30,000 open connections. This problem can show itself in IIS or apache running out of sockets and browsers getting connection time outs when trying to access the site, again the load on the server is in proportion to the number of clients and not the frequency of data change.
I used to work on market data systems, specifically equity market data where we could have 1000s of updates per second, having each traders PC open a point to point connection to the server would have required very expensive and fast servers, so instead we used broadcast network protocols: UDP or IP Multicast. The load on the server then went up in proportion to the number of changes to the data and not the number of clients, far more scalable and economical.
This sounds like a good solution to sending changes to browsers, while UDP only works well on a LAN the IP Multicast protocol is designed to work accross multiple LAN segments in a scalable fashion. I'll not bore you all with details of how the protocol works, but what is interesting is that most routers used within the internet can support this protocol already.
Suppose I had 100 subscribers for updates on the same LAN segment, each needing updates from a remote server over a WAN link. The IP Multicast protocol is designed to send only one packet to the router over the WAN link, the router than fans out that update. In a traditional connection oriented setup each of those subscribers would have made their own point-to-point connection to the remote server, the same peice of data would have been sent over the link to the router 100 times!
Multicast protocols are actually supported by the common multimedia players as well, the idea being that if a few thousand people on the same ISP are watching the same live event then only one stream needs to be sent to the ISP which can then fan that stream out in an efficient manner. The reality seems to be that this doesn't happen, all the people watching the event make point to point connections, the load on the internet and the server is thousands of times larger than it needs to be.
So why isn't multicast used? One reason suggested is that the ISP's would need to agree the required peering relationships, they'd have to agree to allow multicast to pass between each others networks. Another challenge is how to handle the millions of different multicast groups that would be required to support the number of streaming sources on the internet. So for now we are stuck with inefficient connection oriented protocols for data flows that are really connectionless and broadcast in nature, as the amount of streaming dynamic data on the internet increases is it time to look again at multicast?


