This is a guest post by Chris, a RavenDB user, originally posted on the RavenDB group.
I’ve been fiddling with document databases for a while trying to find a way to aggregate our logs without having to demoralize everything. I started with CouchDB and because of the odd status of the project with the founder abandoning it and since I work in a Windows world, going with Raven ultimately seemed like a wholly better solution – all the goods of Couch plus a .Net API.
I built a data model in C# and used a CSV library to push a few hundred thousand records into Raven. The goal is to allow support engineers to query this log information with some simple PowerShell one-liners.
I thought parsing 100megs of CSV and jamming in the thousands of corresponding documents would be the difficult part to figure out and the one-liners would be a couple hours of work at most.
Yes, it was a bit challenging to build an efficient importer, but most of that was due to my limited knowledge of C#. I thought it was still not working when I started to build my Posh one-liners and some queries were returning invalid JSON. The PowerShell 3 CTP introduce some new cmdlets for handling JSON and one specifically for executing HTTP Rest API calls – Invoke-RestMethod (alias irm). IRM is 100 times more user friendly than CURL and it even goes a step above and beyond by automatically deserializing the content returned. So, it’s extremely easy to consume JSON via a REST API, but I kept running into an issue where Posh would throw an error that there was no Results (the deserialized JSON) property on the object. I must have been pushing in bad data – garbage in equals garbage out.
I thought the HTML in some of the log values was invalidating the JSON string. I deleted all the documents, added a call to HTTPEncode on all strings before Storing it to Raven, then I reloaded the records. I ended up with the same results; or, lack of Results property in this case.
There was a missing quote around a key name. I almost posted this as a RavenDB bug, but I checked the raw HTTP response in Fiddler and found that the quote existed.
What do to about it
I am working with a Technology Preview release of Powershell 3, so I checked Connect and found a similar report of missing characters in content deserialization.
“It seems that the last byte of the second HTTP response packet is being dropped by powershell. This only seems to happen when the second response packet is smaller than the first.”
Microsoft closed this issue as “Won’t Fix” and suggested using Invoke-WebRequest instead. Invoke-WebRequest removes all the magic of Invoke-RestMethod and just gets raw HTTP content. I piped the Content parameter to the, also new to Powershell 3, cmdlet ConvertFrom-Json and got proper deserializedJSON.
$c = Invoke-WebRequest "http://localhost:8080/indexes/vovici/Logs/process?query=customer:blah"
$r = $c.content | convertfrom-json
So, it’s an extra step, but if you want to reliably access RavenDB’s HTTP API via PowerShell 3 that seems to be the way to do it.
UPDATE: It does look like MS fixed the lost character issue in the Invoke-RestMethod cmdlet in the latest beta release. I can pull down 1024 documents with one web request and get a usable, deserialized Object.