ArrayLust - My geek side

Thursday, 23 May 2013

My first Node.js application

For a long, long time now, I've been thinking about a personal project to do with weather predictions, using Met Office data. I finally got round to starting it last night, using Node.js.

I'd originally thought up the concept because my wife is constantly checking the weather. I can understand this; in the UK, the weather seems to be appalling and unpredictable most of the time. It occurred to me that she was always checking the prediction for a point in time (e.g. the weekend), but that the prediction changes as we approach that point in time, as the Met Office revises their data.

Now, you can get hold of prediction data, but you need to store it and process it. Getting it into a database is the first priority, and since there is a lot of it, the process needs to be quick and scalable. I've been using MongoDB for a couple of years, and I can definitely say I've become a fan. It's not without challenges though. One of these is that it hogs quite a lot of memory and the other is that it requires quite a lot of disk space. I guess this is because the documents are not stored in quite such an optimal manner as in a SQL database. Then I came across Digital Ocean, who offer really cheap servers that are easy to scale - much cheaper and easier to scale than Amazon's EC2, I have to say.

I was originally going to write the app in Java and host it somewhere, but it's given me an opportunity to use Node.js. I've been using Javascript a lot over the past few years, and I'm quite familiar with the MongoDB shell, so I thought I'd give it a go.

From starting to look at the docs to getting a working app running on the server, downloading, parsing and inserting data into MongoDB? Three hours. I think that's a bit of a result.

From 5000 or so locations across the UK, I harvested 200,000 predictions for 3-hour markers over the next 5 days. For all subsequent predictions, I'll append the forecast to an array, giving me all the predictions for a point in time and space in a single handy doc. Each doc looks like this for a single prediction:


> db.forecasts.findOne();

{

 "loc" : "3220",

 "fcD" : ISODate("2013-05-27T00:00:00Z"),

 "t" : "360",

 "d" : [

  {

   "D" : "NW",

   "F" : "2",

   "G" : "20",

   "H" : "78",

   "Pp" : "6",

   "S" : "11",

   "T" : "5",

   "V" : "EX",

   "W" : "7",

   "U" : "1",

   "pd" : ISODate("2013-05-23T11:00:00Z")

  }

 ],

 "_id" : ObjectId("519e077c8bf02215220014f5")

}

The loc field is the location ID - location information is stored in a separate collection. fcD is the forecast date - the date that they expect to have this weather. t: 360 means 360 minutes into the day. In the prediction data, 'pd' means 'prediction date' - They predicted at 11am today that the weather would have these characteristics. All other fields are stats like temperature, humidity, etc.

That's all the app does right now, but the next step once I've harvested some data is to start analysing it. Some things I have in mind:

What is the average delta between first and last predictions for a point in time?
Graph predictions over time for a point in time. i.e. how much do the predictions vary about temperature, humidity etc. from the first to the last prediction?
How do different weather stations compare? Are some more accurate than others? This might be regional or down to equipment or personnel at a particular site.

There is also other data - what actually happened. It'd be nice to compare this across months in a Circos style graph. Linking each temperature to months that had days with the same max or min temperature.

I plan to use D3.js for graphing. I've used it a little before. It's the kind of Javascript library that makes you change your way of coding. It certainly helped me pick up Node.js quickly.

Wednesday, 23 May 2012

StackOverflow, meet Raspberry Pi

Like many people (in the hundreds of thousands, so I understand), I'm waiting for a Raspberry Pi.

As an avid StackOverflow user, I'm very much interested in promoting a new beta site they're proposing, whcih will be dedicated to the Pi. StackOverflow's model promotes participation and moreover, promotes correct answers, and I find I'm using it more and more, searching for answers, asking questions and trying to help others.

If you're awaiting a Pi like me, take a look, cast a vote:

Stack Exchange Q&A site proposal: Raspberry Pi

Friday, 14 October 2011

Awesome JQuery from Ubuntu

Ubuntu 11.1 has an online demo, which is made up from a load of Javascript, but looks like a real UI. The virtual Firefox app has history states and lets you open other websites - including the demo site...

It's an awesome bit of coding with massive potential. They're not far away from making a web based UI layer that would allow users to manage virtual servers over port 80...

http://www.ubuntu.com/tour/#

Tuesday, 9 August 2011

My experiences with JMeter and PeopleSoft

Over the past few months, I've carried out more performance testing than I'd have chosen to, mostly against PeopleSoft in different environments. There are a number of sites that gave me useful tips to get through the process, and I thought I'd share them here.

It's also useful to know what breaks :)

About JMeter

JMeter is an extensible Java based tool to conduct various different types of testing, including functional, regression, load, soak and stress testing. Tests are controlled either from a graphical user interface or from the command line, and the ‘master’ host can control several ‘injectors’.

JMeter is an open source package from the Apache Software Foundation. For this project, it carries several advantages:

1. It carries no license fee
2. It can be extended
3. It has a fairly simple user interface
4. It supports multiple test injectors, required to push the system to its limits

JMeter is not a perfect tool, and has some disadvantages too, such as a slightly buggy user interface and slightly sparse reporting capabilities. Despite this, it is the best of the crop of open sourced options. Utilizing third party plugins can mitigate the sparse reporting.

My PeopleSoft test plans generally follow this structure:

Basically, this test will just perform a login. It includes a bunch of required elements for any PeopleSoft test - it extracts a session variable using a regular expression, ICSID, which you can use in other test elements further down, it includes a cookie manager (set to 'compatibility mode' - default setting will not work), and it includes a sampler to 'grab test data' - this accesses a simple web service I wrote that provides a row of test data in XML format - this adds next to no additional time to the test, but is really useful because when testing using multiple test injectors, it means that I don't need to copy data across to each server.

Recording the test is reasonably straightforward, but there are some gotchas. Firstly, you want to add an HTTP proxy element to the 'workspace' element, and set that up to use HTTP Client for the HTTP samplers. You also want to add some variables to a config element, including the PeopleSoft environment name, host, port, and protocol. For example:

1) PsEnv=DMO

2) host=localhost

3) port=80

4) protocol=http

If you do this, then when the proxy is recording test elements, it'll automatically replace DMO with ${PsEnv}, etc., saving an awful lot of time processing your test afterwards. You'll also find that most processes in PeopleSoft actually entail multiple GET and POST requests to the same URL. When you run your test, these will get grouped in the results. You can tell JMeter to prefix each sampler name with a counter by editing the JMeter.properties file.

An article on how to configure the ICSID parameter is here: http://www.erpassociates.com/peoplesoft-corner-weblog/utilities/using-jmeter-for-peoplesoft-performance-testing.html

With the ICStateNum parameter, if you leave it in your tests, you'll get the login page all the time. It seems to be OK to simply remove it from tests.

When you're looking at results, always add a 'View Results Tree' element while you're developing the test - this will allow you to see individual HTTP requests and responses (rendered as HTML!). This is incredibly useful.

If you decide to add 'user wait time' to your test, so you can pad out a transaction to take about as long as a real user would take, be aware of the scope of timers in JMeter. If you have a sampler followed by a timer, followed by two more samplers, then each sampler will get the same delay added. This is probably not what you want. Instead, add timers as child nodes for a sampler you want to have a delay after.

I also found the following useful:

http://javaworks.wordpress.com/2010/06/15/functions-in-jmeter-bean-shell/
http://www.javaworld.com/javaworld/jw-07-2005/jw-0711-jmeter.html?page=1
http://code.google.com/p/jmeter-plugins/downloads/detail?name=JMeterPlugins-0.4.1.zip

Especially the last item - JMeter plugins has some very useful charts, including threads over time - if your test uses many injectors, this is a useful way to see when your test is ramped up, and if there are problems with the injectors failing.

Finally, it's a good idea to run JConsole against your injector JVMs. This will give you a feel for how much load you can push through from a single host. In our tests, with user wait times, we found we were able to push through 800 to 1000 users (threads), each with user wait time. All this took was to tweak the JVM heap size in the jmeter.bat file, since the servers we were using as injectors each had plenty of CPU power and memory.

Thursday, 26 August 2010

Java just isn't being friendly today

Is it just me? Shouldn't Java app servers be more consistent?

I'm developing a commercial web application. This app must work in a variety of application servers, on a variety of databases, with minimal configuration of the kind that doesn't require users to understand Java.

So, I'm using Hibernate, and after a lot of faffing, have got Hibernate to work using a SequenceStyleGenerator (very nice) so it can handle different id generation schemes (sequences vs auto-increment).

I'm also using Spring, for dependency injection, transactions, and a bunch of other things like MVC, annotations and security. Spring is very nice for all of these things, but is a bit of a pain when it comes to changing anything without opening up the WAR / EAR.

My datasource, defined in Spring config, gets DB connections from a JNDI lookup. I define that JNDI lookup in the application server, but of course every application server exposes JNDI in a different way, so unless I want to maintain different WAR file builds for different application servers, I need a way to change the properties on startup.

Recently, I started using Spring's PropertyPlaceholderConfigurer, which takes a properties file as an argument: in my case 'classpath:my.properties'. This allows me to externalise the properties for the key things that need to be changed, like the datasource.


<bean id="propertyConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
 <property name="location">
  <value>classpath:my.properties</value>
 </property>
</bean>

So now my datasource bean declaration is


<jee:jndi-lookup id="dataSource" name="${myDatasource}" />

All well and good. In most app servers, you then just drop the properties file on the classpath. In Weblogic (on Windows in this case), however, try as I might, I just couldn't get the thing to recognise the properties file, and since the datasource is pretty central to the app, it just didn't start. It's apparently not enough to just put it in the lib directory that Weblogic happily loads JAR files from.

The solution, which was simple, took a long time as I tried different things. It involves editing %WL_HOME%/user_projects/domains/base_domain/bin/setDomainEnv.cmd and adding the following line near the top:

set CLASSPATH=.;%CLASSPATH%

All this does is add the current directory to the classpath. Then, just drop the properties file into %WL_HOME%/user_projects/domains/base_domain and you're good to go.

Next problem! JNDI woes. I'd done this before in Weblogic and thought I had it sussed, so was disappointed when it suddenly stopped. I'd created my datasource with a JNDI name of jdbc/myDS, and I was looking up that same name from my Spring config (no need to prepend java:comp/env/ in Weblogic). But it didn't work. The solution, again simple when you know, is to make sure that you don't click 'Finish' prematurely when creating the datasource, but click 'Next' straight to the final screen, where you choose the 'targets'. If you're using a default install, that target will be 'AdminServer'. Then it will work.

Wednesday, 9 December 2009

Nice slides about negative corporate environments

Corporate Design vs Startup Design - A Love Story

View more presentations from Amir Khella.

Thursday, 26 November 2009

Sequences not allowed here

OK, this was just annoying, but since I couldn't find anything on the web when I looked, here's what my problem was:

I was modifying some hibernate code to use the SequenceStyleGenerator annotation, so it will work cross-DB. I'd initially developed using MySQL, and had a MySQL hibernate dialect specified in my config files. I exported the schema to an Oracle XE instance using hbm2ddl in Eclipse, and started using an Oracle driver in my datasource.

Every time I tried to insert anything, I got an oracle error - Sequence not allowed here. Ultimately, just removing the dialect entirely from the config files worked, since Hibernate can auto-detect the dialect most of the time.