Replaces C++ code with a native approach

Dropbox recently posted how it made the camera download process for Android faster and more reliable. Dropbox engineers removed the shared Android and iOS C++ code and replaced it with a platform-native Kotlin implementation. Engineers are happy with the decision to rewrite the process, saying error rates have gone down and download performance has improved greatly.

In their blog post, Sarah Tappon and Andrew Haigh, software engineers at Dropbox, advise that “it’s important to make sure the benefits justify the effort” when embarking on such a rewrite. “Ultimately, this will help you determine if you got the results you wanted.” They also advise shipping risky projects to a small audience “wide enough to give you the data you need to gauge success. Then watch and wait for your data to give you the confidence to continue.”

Metrics showing a decrease in error interactions and an increase in “all done” interactions after release

One of the main design constraints of the camera download process is Android’s strong constraints on how often apps can run in the background and their capabilities. “For example, App Standby limits our background network access if the Dropbox app hasn’t been brought to the fore recently.” This limitation means that the app can only be allowed to access the network for a 10-minute interval once every 24 hours. The cross-platform version was not equipped to deal with platform-specific constraints, and it was often doomed to failure. On the other hand, Dropbox engineers adapted the new Kotlin implementation around these limitations.

As part of the rewrite, Dropbox engineers designed the new performance native process. First, they started using parallel downloads. The C++ version downloaded only one file at a time. Implementing parallel downloads with Kotlin coroutines was much easier to manage than it would have been with manual thread management in C++. Second, they optimized memory usage by “dynamically varying the number of concurrent downloads based on the amount of system memory available” and reusing ByteArray buffers to avoid garbage collector pressure.

The Dropbox team used several methods to validate that the rewritten process works as expected. One of the critical techniques was to validate “many low-level components by running them in production alongside their C++ counterparts and then comparing the outputs.” This technique allowed the team to confirm that the new components were working properly before relying on their results.

Another technique the team used was to be more strict about state transitions in the system. Each photo upload had a state assigned to it, and engineers proactively validated each state transition against the list of allowed transitions. Tappon and Haigh describe the result:

These checks helped us catch a nasty bug early in our deployment. We started seeing a high volume of exceptions in our logs that were caused when camera uploads attempted to change photos from DONE to DONE. This made us realize that we were uploading photos multiple times!

Possible photo upload state transitions

When it was time to deploy the new implementation, the team made sure it supported falling back to the C++ implementation. Additionally, the team first rolled out the implementation to an opt-in beta user group. “This group of users was large enough to detect rare errors and collect key performance metrics such as download success rate.” They monitored these key metrics in this population for several months to ensure it was ready to ship at scale. The team concludes that the months spent in beta have paid off and they have finally completed the project ahead of schedule.

Comments are closed.