Hibernating Rhinos

Zero friction databases

Data Virtualization in Silverlight: Digging into VirtualCollection

This is a guest post by Samuel Jack, a freelancer who has worked with us on the new RavenDb Studio features.

The other day I showed you how we enabled the display of massive scrollable lists of items in Silverlight by implementing data virtualization for DataGrids and ListBoxes. If you’ve looked at that sample, you might be curious as to how it all works.

That is the subject for todays post.

What is Virtualization?

If you’re familiar with Silverlight or WPF, you’ll know all about virtualization when it comes to ListBoxes and DataGrids. Virtualization is a trick the framework uses to avoid having to render UI for items in a list that are not visible on screen. It creates UI for items that are visible, and as the user scrolls, it destroys the UI elements for data that has disappeared off screen, and creates new elements for items that have just come into view. VirtualizingStackPanel is one of the classes in WPF and Silverlight that is responsible for handling this on behalf of controls like the ListBox.

VirtualCollection, the class that lies at the heart of our solution, takes this one step further, and applies the same idea to the items in the list themselves. Rather than loading all the items into memory up front, it pulls in just the ones that are needed as the user scrolls (which might involve making a call to a server), and unloads any items which are not likely to be needed again any time soon. Sounds simple enough in theory, but getting this to work in Silverlight involves a bit of trickery.

Appetite reduction therapy, Silverlight style

The first problem we come up against is Silverlight’s I-want-it-and-I want-it-all-NOW attitude to data. If you present a Silverlight ListBox or DataGrid with a simple IList to be its data source, Silverlight will ask the list for its Count, and then proceed to request every single item in the list, irrespective of how much it can actually display at that moment. This isn’t a problem in most use cases, but when we want our VirtualCollection to display lists with potentially billions of data items, that approach just isn’t tenable. So we have to find a way to get Silverlight to slim down its appetite for data.

The trick here, it turns out, is to implement ICollectionView as well as IList. Flaunting this extra interface seems to be enough to persuade Silverlight to back down, and just ask for the items it can actually display, as it needs them. There is some extra work involved, as ICollectionViews are supposed to be responsible for keeping track of the current item selected in a list, but we can handle that.

So with ICollectionView implemented, Silverlight now asks the VirtualCollection for items it wants to display one by one. But that leads to the next problem. When a ListBox, say, first comes knocking on  the door,  VirtualCollection doesn’t have any items in stock. They’re all on the server, and have to be fetched into memory. But VirtualCollection can’t keep Silverlight waiting whilst it goes and does that, or Silverlight will sulk and hang the browser. So VirtualCollection fobs Silverlight off by giving it back a VirtualItem which acts as a place-holder for the real item. Silverlight can then happily create a row in the ListBox, albeit a blank row, because it has no actual data yet. When the real item arrives, fresh from the server, VirtualCollection hands it to the VirtualItem, and through the magic of data binding, the row lights up with the appropriate data.

Let’s take a look at some code, to see how VirtualCollection implements this process.

VirtualItems

We’ll start where Silverlight starts, when it wants to display a particular item: at VirtualCollection’s indexer:

public VirtualItem<T> this[int index]
{
  get
  {
      RealizeItemRequested(index);
      return _virtualItems[index] ?? (_virtualItems[index] = new VirtualItem<T>(this, index));
  }
  set { throw new NotImplementedException(); }
}

Two things are happening here: first up, we’re kicking off a request to realize the item – fetch it from the server, in other words. Secondly, we’re creating a VirtualItem to stand in for the real item – or if we’ve already created a VirtualItem representing that index, we simply return that.

It’s worth saying a word about how we store all these VirtualItems, because they are not stored in a standard List. The reason for that is that there could be billions of them, and we wouldn’t want to allocate an array that big (which is what a List would do). That would be especially wasteful given that the vast majority of the array would never be used, since the user is unlikely to look at every single item.

So VirtualItems are stored in a SparseList, which allows items to be accessed by index as conveniently as if they  were in one big array. Under the covers it actually stores items in lots of smaller arrays which can be more easily garbage-collected when they’re no longer needed.

To Fetch a Page of Data

Let’s chase the process along a little further.

public void RealizeItemRequested(int index)
{
  var page = index / _pageSize;
  BeginGetPage(page);
}

private void BeginGetPage(int page)
{
  if (IsPageAlreadyRequested(page))
  {
      return;
  }

  _mostRecentlyRequestedPages.Add(page);
  _requestedPages.Add(page);

  _pendingPageRequests.Push(new PageRequest(page, _state));

  ProcessPageRequests();
}

Items are never fetched from the server as individuals, but always in pages, so that we can make the most of each round-trip across the wire. The first thing that RealizeItemRequested does, therefore, is work out which page the requested index is part of. Then it puts in a request to fetch that page.

BeginGetPage doesn’t do any fetching directly. After ensuring it isn’t wasting time fetching a page it already has, what it does do is push a fetch-request for the page on to a stack, _pendingPageRequests. It also adds the page to a list it maintains of the mostly recently requested pages. This list is used to manage the cache of pages which are kept in memory. When the list gets too long, the oldest pages are evicted from memory.

Finally BeginGetPage goes off to process the page requests.

private void ProcessPageRequests()
{
  while (_inProcessPageRequests < MaxConcurrentPageRequests && _pendingPageRequests.Count > 0)
  {
      var request = _pendingPageRequests.Pop();

      // if we encounter a requested posted for an early collection state,
      // we can ignore it, and all that came before it
      if (_state != request.StateWhenRequested)
      {
          _pendingPageRequests.Clear();
          break;
      }

      // check that the page is still requested (the user might have scrolled, causing the 
      // page to be ejected from the cache
      if (!_requestedPages.Contains(request.Page))
      {
          break;
      }

      _inProcessPageRequests++;

      _source.GetPageAsync(request.Page * _pageSize, _pageSize, _sortDescriptions).ContinueWith(
          t =>
          {
              if (!t.IsFaulted)
              {
                  UpdatePage(request.Page, t.Result, request.StateWhenRequested);
              }
              else
              {
                  MarkPageAsError(request.Page, request.StateWhenRequested);
              }

              // fire off any further requests
              _inProcessPageRequests--;
              ProcessPageRequests();
          },
          _synchronizationContextScheduler);
  }
}

There are a number of interesting design decisions revealed by ProcessPageRequests, so it’s worth looking at in some detail.

First, why do we use a stack for managing the pending page requests, and not, say, a queue? Well, think about what happens when a user scrolls a long way through a list, and then stops. As they scroll, page requests will be generated, and will take some time to process, so the queue would grow. By the time the user stops scrolling, the queue might be quite long, and it would take a little time for the most recent requests, the ones for the pages the user is now staring at, to be dealt with. By putting the requests in a stack, which is a first-in-first-out data structure, we ensure that the most recent requests, those most likely to be relevant, are dealt with first.

We might also find that there are some requests which are no longer relevant. If the user scrolled a very long way, causing lots of pages to be added to the _mostRecentlyRequestedPages list, there may well be some pages which have to be kicked out of the cache. Any fetch requests for those pages can be safely ignored.

Requests can become irrelevant for another reason too: the collection state may have moved on since the request was made. The collection state is a concept that helps to make sure the VirtualCollection does the right thing at the right time in spite of all the asynchronous activity going on. The state is simply a number which is incremented every time VirtualCollection is notified by the IVirtualCollectionSource that the collection has changed (this is done in UpdateData and Reset). Whenever an asynchronous action is started, say a request to fetch data from the server, VirtualCollection tags the request with the state at that instant. When the response of the asynchronous action comes back, VirtualCollection compares its current state with the state when the action was begun. If the two are different, VirtualCollection knows that the world has moved on, so that response is no longer relevant.

Notice, finally, in this method that when we fire off the asynchronous request to get a page of data – this is where the IVirtualCollectionSource comes into play – we make sure the response is handled in the UI thread;  that is the purpose of passing _synchronizationContextScheduler to ContinueWith. All responses to asynchronous requests are handled in this way to make sure we don’t have any race conditions when updating VirtualCollection’s internal state.

Acting on the Data

What happens when the data is received from the server?

private void UpdatePage(int page, IList<T> results, uint stateWhenRequested)
{
  if (stateWhenRequested != _state)
  {
      // this request may contain out-of-date data, so ignore it
      return;
  }

  bool stillRelevant = _requestedPages.Remove(page);
  if (!stillRelevant)
  {
      return;
  }

  _fetchedPages.Add(page);

  var startIndex = page * _pageSize;

  for (int i = 0; i < results.Count; i++)
  {
      var index = startIndex + i;
      var virtualItem = _virtualItems[index] ?? (_virtualItems[index] = new VirtualItem<T>(this, index));
      if (virtualItem.Item == null || results[i] == null || !_equalityComparer.Equals(virtualItem.Item, results[i]))
      {
          virtualItem.SupplyValue(results[i]);
      }
  }

  if (results.Count > 0)
  {
      OnItemsRealized(new ItemsRealizedEventArgs(startIndex, results.Count));
  }
}

Data coming back from the server is passed to UpdatePage. Before doing anything with the data, the method checks that it is still relevant – that the collection state hasn’t changed, and that the page to which the data belongs hasn’t been evicted from the cache. Then it takes each item and passes it to its corresponding VirtualItem.

Two snippets from VirtualItem will suffice to show what’s going on there:

public class VirtualItem<T> : INotifyPropertyChanged where T : class
{
    // Snip ...
    
   public T Item
   {
       get
       {
           if (!IsRealized && !DataFetchError)
           {
               _parent.RealizeItemRequested(Index);
           }
           return _item;
       }
       private set
       {
           _item = value;
           OnPropertyChanged(new PropertyChangedEventArgs("Item"));
           OnPropertyChanged(new PropertyChangedEventArgs("IsRealized"));
           IsStale = false;
       }
   }

   public void SupplyValue(T value)
   {
       DataFetchError = false;
       Item = value;
   }
}

VirtualItem makes the real item available through its Item property, and when SupplyValue is called this leads to a PropertyChanged event being raised for the Item property, so the Silverlight databinding infrastructure knows it needs to update the UI for that item.

Handling Updates

There’s one other important aspect of VirtualCollection to consider, and that is how it handles changes to the underlying collection of data.

It is the responsibility of the IVirtualCollectionSource to tell VirtualCollection when these changes happen.

The events are handled here:

private void HandleSourceCollectionChanged(object sender, VirtualCollectionSourceChangedEventArgs e)
{
  var stateWhenUpdateRequested = _state;
  if (e.ChangeType == ChangeType.Refresh)
  {
      Task.Factory.StartNew(() => UpdateData(stateWhenUpdateRequested), CancellationToken.None,
                            TaskCreationOptions.None, _synchronizationContextScheduler);
  }
  else if (e.ChangeType == ChangeType.Reset)
  {
      Task.Factory.StartNew(() => Reset(stateWhenUpdateRequested), CancellationToken.None,
                            TaskCreationOptions.None, _synchronizationContextScheduler);
  }
}

There are two types of changes we consider, a Refresh, and a complete Reset. Refresh happens when much of the data may well have remained unchanged, and it makes sense to continue to display it whilst we check with the server. A Reset happens when the entire collection has changed and displaying stale items would be inappropriate.

Notice that we are dispatching the events to be handled on the UI thread (there’s that _synchronizationContextScheduler again), and as we do so, we tag the request with the current collection state so that when we get round to handling the requests we can ignore any that are no longer relevant.

The Reset method contains nothing of great interest, but UpdateData explains something quite important:

protected void UpdateData(uint stateWhenUpdateRequested)
{
  if (_state != stateWhenUpdateRequested)
  {
      return;
  }

  _state++;

  MarkExistingItemsAsStale();

  _fetchedPages.Clear();
  _requestedPages.Clear();

  UpdateCount();

  var queryItemVisibilityArgs = new QueryItemVisibilityEventArgs();
  OnQueryItemVisibility(queryItemVisibilityArgs);

  if (queryItemVisibilityArgs.FirstVisibleIndex.HasValue)
  {
      var firstVisiblePage = queryItemVisibilityArgs.FirstVisibleIndex.Value / _pageSize;
      var lastVisiblePage = queryItemVisibilityArgs.LastVisibleIndex.Value / _pageSize;

      int numberOfVisiblePages = lastVisiblePage - firstVisiblePage + 1;
      EnsurePageCacheSize(numberOfVisiblePages);

      for (int i = firstVisiblePage; i <= lastVisiblePage; i++)
      {
          BeginGetPage(i);
      }
  }
  else
  {
      // in this case we have no way of knowing which items are currently visible,
      // so we signal a collection reset, and wait to see which pages are requested by the UI
      OnCollectionChanged(new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset));
  }
}

Remember that the purpose of UpdateData is to check that each item in the collection remains up to data. But it has to be frugal: it can’t go off to the server to ask about every possible item. Once again, it needs to limit itself to just those items that are visible.

One way of doing this, the way VirtualCollection worked when I first implemented it, is to raise a CollectionChanged event with an NotifyCollectionChangedAction of Reset. This signals to any listening ListBoxes or DataGrids that the whole collection has changed and they should rebuild their display of the data. They then respond by calling VirtualCollection’s indexer for each index  in the visible range, and thus VirtualCollection finds out which pages it should refresh. In many circumstances this is the only way of refreshing the collection because we have no other way of knowing exactly which items are visible.

But it is a rather sledgehammer approach, rebuilding all the rows in the UI, when we could just update the data within those rows, if only we knew which items are currently visible. So before resorting to a collection reset, VirtualCollection will try another approach. It tries asking nicely anybody who might be listening which items are visible.

It does this by raising the QueryItemVisibility event. To make sure this request doesn’t go unheeded, you need to attach a behaviour to your DataGrid or ListBox which will respond to the event. As part of the code sample, I have provided the ProvideVisibleItemRangeFromDataGridBehavior and the ProvideVisibleItemRangeFromItemsControlBehavior – though note that currently, this last behavior only does anything useful if the ItemsControl (usually in the form of a ListBox) is using a VirtualizingWrapPanel to display its items. See MainPage.xaml in the sample for an example of how to attach these behaviors.

The Final Word

And that concludes our dissection of VirtualCollection. Remember that you can try out a live demo here, and get all the code on github so that you can put it to use in your own projects.

Do let us know how you get on.

Next time: exploring the VirtualizingWrapPanel.

Comments

ToddB
11/21/2012 07:19 PM by
ToddB

SWEET! Thanks!

Comments have been closed on this topic.