PetaPoco may cause high memory usage with certain queries

If you are using PetaPoco, or NPoco (which seams to be the most up-to-date fork of the project), the title of this post might be a bit scary… but hopefully you won’t have to worry. This really depends on how you create your queries and how many different query structures you are executing.

High memory usage

Here is the code in relation to the memory growth when using PetaPoco:

https://github.com/toptensoftware/PetaPoco/blob/master/PetaPoco/PetaPoco.cs#L1836

What is happening here is that every time a POCO needs to be mapped from a data source, this will add more values to a static cache, specifically this one:

https://github.com/toptensoftware/PetaPoco/blob/master/PetaPoco/PetaPoco.cs#L2126  (m_PocoDatas)

This isn’t a bad thing… but it can be if you are:

  • using non-parameterized where clauses
  • you have dynamically generated where clauses
  • you use a lot of sql ‘IN’ clauses – since the items in the array being passed to the ‘IN’ clauses is dynamic
  • you have tons of differently statically unique where clauses

Each time a unique SQL query is sent to PetaPoco it will store this SQL string and associate it to a delegate (which is also cached). Over time, as these unique SQL queries are executed, the internal static cache will grow. In some cases this could consume quite a lot of memory.

The other thing to note is how large the ‘key’ that PetaPoco/NPoco uses:

var key = string.Format("{0}:{1}:{2}:{3}:{4}", sql, connString, ForceDateTimesToUtc, firstColumn, countColumns);

Considering how many queries might be executing in your application, the storage for these keys alone could take up quite a lot of memory! An SQL statement combined with a connection string could be very long, and each of these combinations gets stored in memory for every unique SQL query executed that returns mapped POCO objects.

Parameterized queries vs. non-parameterized

Here’s some examples of why non-parameterized queries will cause lots of memory consumption. Lets say we have a simple query like:

db.Query<MyTable>("WHERE MyColumn=@myValue", new {myValue = "test"})

Which results in this SQL:

SELECT * FROM MyTable WHERE MyColumn = @myValue

This query can be used over and over again with a different value and PetaPoco will simply store a single SQL key in it’s internal cache. However, if you are executing queries without real parameters such as:

db.Query<MyTable>("WHERE MyColumn='hello'");
db.Query<MyTable>("WHERE MyColumn='world'");
db.Query<MyTable>("WHERE MyColumn='hello world'");

Which results in this SQL:

SELECT * FROM MyTable WHERE MyColumn = 'hello';
SELECT * FROM MyTable WHERE MyColumn = 'world';
SELECT * FROM MyTable WHERE MyColumn = 'hello world';

Then PetaPoco will store each of these statements against a delegate in it’s internal cache since each of these string statements are not equal to each other.

Depending on your application you still might have a very large number of unique parameterized queries, though I’d assume you’d have to have a terrifically huge amount for it to be a worry.

Order by queries

Unfortunately even if you use parameterized queries, PetaPoco will store the SQL query key with it’s Order By clause which isn’t necessary and will again mean more duplicate SQL keys and delegates being tracked. For example if you have these resulting queries:

SELECT * FROM MyTable WHERE MyColumn = @myValue ORDER BY SomeField;
SELECT * FROM MyTable WHERE MyColumn = @myValue ORDER BY AnotherField;

PetaPoco will store each of these statements in it’s internal cache separately since the strings don’t match, however the delegate that PetaPoco is storing against these SQL statements isn’t concerned about the ordering output, it’s only concerned about the column and table selection so in theory it should be stripping off the last Order By clause (and other irrelevant clauses) to avoid this duplication.

A slightly better implementation

First, if you are using PetaPoco/NPoco, you shouldn’t use dynamic queries for the point’s mentioned above. If you need this functionality then I suppose it might be worth these libraries adding some sort of property on the Database object or a parameter in either the Fetch or Query methods to specify that you don’t want to use the internal cache (this will be slower, but you won’t get unwanted memory growth). I’d really just suggest not using dynamically created where clauses ;-)

Next, there’s a few things that could be fixed in the PetaPoco/NPoco core to manage memory a little better:

  • The size the the key that is stored in memory doesn’t need to be that big. A better implementation would be to use a hash combiner class to combine the GetHashCode result of each of those parameters that make up the key. This is a very fast way to create a hash of some strings that will result in a much smaller key. An example of a hash combiner class is here (which is actually inspired by the various internal hash code combiner classes in .Net): https://github.com/umbraco/Umbraco-CMS/blob/7.2.0/src/Umbraco.Core/HashCodeCombiner.cs
  • Instead of storing all of this cache in static variables, have them stored in an ObjectCache/MemoryCache (http://msdn.microsoft.com/en-us/library/system.runtime.caching.objectcache(v=vs.110).aspx) with a sliding expiration so the memory can get collected when it’s unused
  • The Order By clause should be ignored based on the point mentioned above

I’ve created a PR for NPoco here, and also created an issue on the original PetaPoco source here.

Author

Shannon Thompson

I'm a Senior Software Engineer working full time at Microsoft. Previously, I was working at Umbraco HQ for about 10 years. I maintain several open source projects (many related to Umbraco) such as Articulate, Examine and Smidge, and I also have a commercial software offering called ExamineX. Welcome to my blog :)

comments powered by Disqus