From camera to customer faster than ever before
We have photographers all over the world covering movie premiers, football (and futbol!) games as well as war-torn areas – so managing and delivering that content is a challenge we’ve had overcome to ensure our customers have the powerful content they need, faster than ever before.
To kick off our new Technology series on the Getty Images blog, I wanted to give an overview of this automated asset delivery system we use for Sports, News and Entertainment imagery, commonly known within Getty Images as Feed Manager. In a nutshell, Feed Manager is our front-line system for both receiving imagery from photographers and delivering it to our websites and customers. It recently underwent a major overhaul to deal with scalability, new business rule requirements and – very important to our customers — to improve our time to market.
The general flow of our Editorial imagery is pretty simple. We get a photograph, we add both subjective and objective information, we figure out who should get it and we deliver the image. The challenge is in the volume of imagery we handle, the vast criteria we support for deciding what goes where and the speed from camera to customer.
On average we will get around 25,000 incoming images per day ready to deliver to customers. During heavy load, such as New York Fashion Week, that number can be more than 50,000. Of course, the actual number of images taken is significantly higher; the ready-to-deliver images represent only those which have been approved to send. Because each image gets delivered to multiple customers, that 25,000 incoming image count translates into an average outgoing rate of 750,000 images per day. During the Royal Wedding weekend in April we sent out more than 3 million images, with a peak rate of 175,000 per hour!
Our Feed Manager system is responsible for receiving the image, validating the data on the image, figuring out where it needs to go and queuing the image for delivery. To handle the complexity of the system and the scalability required we broke down the system into three major independent parts: Ingestion, the Feed Engine and Delivery.
Each part is independent from the others, and has its own database, API and server configuration. The separation allows us to develop and tune each component independently and provides a great deal of flexibility on how other systems interact with Feed Manager. For instance, we recently introduced on-demand delivery where specific images can be pushed out via the Delivery system to customers without having to go through the automated engine.
Ingestion and Delivery are C# Windows services built with the very nice Topshelf framework. The Feed Engine is a WCF service that sits within IIS. All services have recently been refactored to leverage the Task Parallel Library to provide asynchronous I/O and concurrency where appropriate.
The Path of an Image
So what happens to an image once we receive it? Quite a bit of stuff happens beforehand, but by the time the image hits Ingestion it is considered “ready to go.” That means it has been approved and has required information embedded in both IPTC and XMP regarding who took the image, where it was taken, what the image is, who is in it… as well as a slew of other important data points.
The system receives images via FTP. A customC# module embedded within IIS assigns a unique identifier to the image and sends a message to a SQL Service Broker queue. The Ingestion Windows service listens for messages on the queue and handles each within a newly spawned thread. The image is downloaded from the FTP server, inspected, validated; metadata is pulled from the image and processed, and then the file heads to a common storage area sitting behind another internal FTP server.
Once the image is stored, we hand over control to the Feed Engine – the real heart of the system — by calling a WCF service which receives only the metadata from the image. It’s a rules engine and stores information on who should get which image.
Any information on the image can be used in a rule: keyword, category, subjective quality, location — in any combination and in any number. This flexibility allows for very tailored content feeds for our customers.
An example would be “I want all important news images taken around the world, every entertainment image you have from North America and only soccer or cricket images relating to European countries.” We have lots of rules to determine what needs to go where and to whom. All rules are stored in a SQL database, but in order to process these efficiently, we dynamically compile C# code from an XML representation of the rule. The dynamic code compilation creates a single function in a single class which returns either true or false if the current image metadata matches the given rule.
Because rules are normalized and stored in a database, we can easily search and change them as needed. And because they get compiled into C# code, we can process thousands of rules near-instantaneously. The average time per image to go from our FTP server, through storage and ingestion and through the feed engine is only two seconds (and when we need to we can get that number even lower). All this on only four virtual servers, split two-and-two between our Ingestion and Feed Engine systems.
Once the Feed Engine has done its thing, we queue the image up for delivery. At this point, we have a list of who should get the image. Each customer has their own queue, and we prioritize within the queue so the best images get delivered first if there is a backlog. However, images do not sit in the queue very long; we can customize how much we send to each customer at once to avoid flooding. As long as a customer can keep up with our pace, they get the image without any delay. We have a total of 10 virtual servers for delivery, with two eight-core physical servers dedicated to customers who want resizing or visual indication done before delivery.
We will be adding more insight into our Feed Manager system on this blog, so stay tuned. Our queue and web service-based approach has allowed the unique components to scale and evolve differently, and our use of code generation and concurrency with the TPL has allowed fast processing under high load. This allows us to deliver the world’s best imagery in a fast, scalable and reliable way, so our customers can have the best content at their fingertips.
Editor’s note: The author of this post, Michael Hamrah, is the Director of Engineering for Editorial Technology in New York. You can follow him on twitter via @mhamrah.