Let’s start off with a few simple questions…
- Have you ever worked for a client who has Reporting for their data?
(Most of developers have, since a business without Reporting is blind.)
- Do your users Export their data in Excel?
(Excel is still the default choice for Data Analysis, so it’s likely any business will offer the possibility to Export their report to Excel format.)
- Have you encountered having multiple users downloading a large Excel file, and ending up in Out Of Memory exception?
(We actually had this use case for a 750Gbs RAM server! This is caused by the fact that most Reporting solutions will create the Excel in server memory before downloading, which takes few Gbs RAM for each user’s Excel file!)
If your answer is YES to any or all of these questions, then I propose an efficient solution for you, by implementing your own Custom Export.
Note: This article is focused on Microsoft Reporting Services (SSRS), because it is coming with SQL Server as free, it is a mature solution (was first released in 2004), and has a lot of features.
However, with few adjustments the approach can be used for other Reporting solutions too, or you could create an Excel Exporter package to be re-used for different areas of your business.
Here are some metrics comparing the default SSRS export with a custom built one:
a. Basic test, 1 million rows, VM with 2.67 Ghz, 2 Cores, 2 logical processors, 16 Gb RAM
b. Load test, 13,000 rows, VM with 2.5Ghz, 1 Core, 2 logical processors, 8 Gb RAM
Conclusion: With new Export we could serve 10 times more concurrent users (Default Export was consuming all server resources with 5 concurrent users, with Custom Export we were able to server 50 concurrent users, and still not getting Out of Memory exception!)
Possible Solutions Researched:
- Adding more RAM
- This was not feasible, since even 750 Gb were not enough for SQL Server + SSRS
- Adding more Reporting servers
- This is for sure a costly solution
- If your business is already using SQL Server, it has a free SSRS license, to be used on the same server with SQL Server. Installing SSRS on each new server requires an extra SQL Server license.
- Imagine that an SQL server license costs $14,000 for each CORE!
- SSRS Custom Extensions
- This was not flexible enough to avoid loading the entire data in memory
- Other Reporting Solutions
- Besides the time to port the reports to a new Reporting solution, I searched two of them and did not seem to have fixed the Out of Memory issue.
- Power BI
- This is a powerful BI solution from Microsoft, but is extremely expensive, and it is more suitable for Self Service BI
- It costs as much as $14,985 / month
- Connect to SSRS from Excel directly
- This adds new concerns, such as creating custom authentication for multi-tenants clients, and would overload our DB servers, if you are not prepared to offer your clients a Self Service BI solution
- Custom Export (the focus of this article)
- Write our own Excel Exporter that would read data straight from DB, format it based on SSRS report definition, all this by processing data in chunks, without loading entire data / Excel in RAM memory
- In order to avoid needing to go to DB twice (View & Export), we save DB data as XML, and use them as data source for both View / Export
- So now the report has an XML on disk datasource, and does not connect to the DB directly
Custom Export Explained:
- Read data from DB row by row, save each row to disk
- Read XML from disk, row by row
- Create the Excel (row by row), save each row on disk
- Send the Excel to the browser, in chunks of bytes
As you can see, at each step we avoid loading entire data in memory, and read, process, save is done row by row, or in chunks of bytes.
Note: We used the disk due to implementation constraints. The idea is to avoid loading entire report into memory, and the OpenXML SDK cannot write directly into the Response stream. There is a paid third party from Telerik that might do that: https://docs.telerik.com/devtools/document-processing/libraries/radspreadstreamprocessing/overview
1. Read data from DB row by row, save each row to disk
a) Use ADO.NET DataReader to read data row by row
b) Write the data row on disk (as an xml node)
c) Call reading & writing methods from a parent method that actually creates the XML on disk (using delegate)
2. Read XML from disk, row by row
Note: the magic keyword that saves us is YIELD
We have a method that reads from disk a new row each time the IEnumberable.GetNext() is called.
So basically it is a nice way to decouple reading one row at a time, from the code that processes it, and it is in a different class, even in different package! Otherwise all code that creates the Excel and populate it would have been mixed with the data retrieval logic -> resulting a “nice” spaghetti code.
3. Create the Excel (row by row), save each row on disk
a) Read Report definition (RDL) using SSRS web service
b) Parse RDL (xml format) to a flat, simple model (List<RdlColumn>)
Each column contains:
- Name, display name, format, width, data type
- VB code that computes column visibility or hyperlink
c) Compute cells in a row based on Report definition
- Name, value, type and
- Hyperlink, computed by calling VB code associated to current column
d) Executing Visual Basic code from C#
- Done using CodeDom and Reflection
Note: SSRS Report Definition files are using Visual Basic code to hide / format columns (much in the same way that Excel also uses Visual Basic for Application for writing macros)
e) Create an Excel row and write it to disk (avoiding to load it in memory)
Below is the code that write a cell to the disk right away.
f) Create the Excel
Tips: – IEnumerable is actually retrieved one row at a time (since it is created with yield) – see #2 above
- Func – we use a delegate that transforms a row of data to a computed cells (see #c) above)
4. Tricks for reading / writing data in chunks
- Use ADO.NET DataReader to read row by row from DB
- Use “yield” to create the illusion that I have a collection ready (when in fact each iteration means a new read from DB / XML file on disk)
- Use delegates – so that I separate each of the steps of read, write, prepare data and writing to Excel (without having to load the entire data in memory at each step)
- Use XmlReader / XmlWriter to read / write intermediate data in chunks
- Use OpenXmlWriter – to write data to Excel in chunks (remember, Excel is actually a package with XMLs inside)
- Send the data to the browser in byte chunks
Implementing our custom Excel Export definitely pays off, as can be seen in the metrics.
- In terms of price it is much less expensive
- In terms of performance it is 3.5 times faster
- It can serve up to 10 times more concurrent users
- It consumes much less RAM memory and CPU usage
- The memory usage increase with concurrency only, and not by the volume of data
- It takes more temporary disk space instead of RAM (which is OK)
- For reports that uses advanced reporting features (like aggregate data), extra work will need to be done on converting the report feature to Excel correspondent feature (so you need to learn Open Office format quite well).
So, knowing the trade-offs, you can decide if it fits your needs or not. In our case, where we have large reports but not so complicated, it really paid off implementing it. Good luck!
When delivering software, Daniel is focused on understanding the business in depth, provide feedback early, analyze more solutions available and choose the one that fits best in the existing constraints.
He joined the Cognizant Softvision team in 2007, and has delivered many successful projects during his tenure.
Latest posts by Daniel Moldovan
- Implementing your own SSRS Custom Excel Export to work at scale - April 1, 2019