2012-05-19

Rant: #pragma(pack) evilness

Wasting time on stuff like can ruin a otherwise great day:

#define WIN32_LEAN_AND_MEAN
#include <winsock2.h>
#include <windows.h>
#include <stdio.h>
int main() 
{ 
  printf("size: %d\n", (int)sizeof(JOBOBJECT_BASIC_LIMIT_INFORMATION)); 
  return 0; 
}
cl test.cpp && test.exe
size: 44 (wrong)
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <winsock2.h>
#include <stdio.h>
int main() 
{ 
  printf("size: %d\n", (int)sizeof(JOBOBJECT_BASIC_LIMIT_INFORMATION)); 
  return 0; 
}
cl test.cpp && test.exe
size: 48 (right)

I have just spent a few hours trying to figure out why SetInformationJobObject() kept failing in one of my projects.
And when it's because Microsoft can't decide if LARGE_INTEGER should be aligned to 4 or 8 bytes, I can't help but cursing their name.

The problem apears to be that winsock2.h does a "pragma pack(4)" before including windows.h, this will mess up some of the types defined in windows.h, including JOBOBJECT_BASIC_LIMIT_INFORMATION.

This was tested with VC2010 and the v7.0A sdk.

2012-03-04

FT232R BitBang mode is broken.


While trying to use a FTDI RT232RL in synchronous bitbang mode, I (like many others apparently) discovered that this feature is completely broken in most chips currently on the market.

The problem is described in the errata but not anywhere in the datasheet or appnode describing the bitbang modes.

It annoys me to no end that the bug is not described in the bitbang appnote, it would have saved me hours of frustration, and could possible save other people quite a bit of wasted time too.

When I was trying to find out what was wrong with my project, I ended up with this small repo case, that should have resulted in a 500 Hz square wave signal.

#include <stdio.h>
#include <string.h>
#include <ftdi.h>

int main()
{
  ftdi_context ftdic;
  ftdi_init(&ftdic);

  if(ftdi_usb_open_desc(&ftdic, 0x0403, 0x6001, NULL, NULL) < 0)
  {
    fprintf(stderr, "ftdi_usb_open_desc failed: %s\n", 
      ftdi_get_error_string(&ftdic));
    exit(1);
  }

  if(ftdi_set_baudrate(&ftdic, 1000) < 0)
  {
    fprintf(stderr, "ftdi_set_baudrate failed: %s\n", 
      ftdi_get_error_string(&ftdic));
    exit(1);
  }

  if(ftdi_set_bitmode(&ftdic, 0x01, BITMODE_SYNCBB) < 0)
  {
    fprintf(stderr, "ftdi_set_bitmode failed: %s\n", 
      ftdi_get_error_string(&ftdic));
    exit(1);
  }

  uint8_t data[256];
  for(int i=0; i<sizeof(data); i++)
    data[i] = i&1;

  for(;;)
  {
    ftdi_write_data(&ftdic, data, sizeof(data));
    uint8_t data2[256];
    ftdi_read_data(&ftdic, data2, sizeof(data2));
  }

  return 0;
}


But when hooking up a scope to the tx line of the FT232, it's easy to see that the output is a complete mess. The pulse widths are all over the place, from 60μs to 1ms.

In the errata, it's suggested that the fix is to run with a baudrate of 3000000, and just generate longer pulses to compensate. This might sound like a usable workaround, but FTDI neglected to tell that the chip is only running with 12.5MBit/s usb speed. Far from the needed ~65MBit/s needed for 3000000 baud (3000000baud * 10bits/byte (with bit stuffing) * 2 (we need to both send and recv) * 10% (usb header overhead).


Running the chip at 3000000 baud (changing 1000 to 3000000 in the ftdi_set_baudrate call), does not work either.
Not only do we not have enough usb bandwitdh (can be seen as the spratic blocks of data in the top part), but we only get 0.5μs long pulses. Not 0.33μs as expected.
The pulse widths are better than at lower speeds, but still not stable.

We can only hope that vendors will soon sell out of there devices using the A and B revisions of these chips, so that we can get some breakout boards with revision C, that supposedly will work as advertised.


2011-10-30

Scary beautiful.

This is a graph of include statements in a project I'm currently working on.
I generated this because I'm trying to reduce the time it takes to compile, and I like the look of the dot produced graph, so I wanted to share it.

The small squares are source files, and there brightness indicates how many headers they include.

The circles are header files, and there brightness tells how often they are included during a clean build.



I'm currently playing with include-what-you-use to clean up the code a bit, and after running it on parts of the code tree, it reduced the build time with almost 30%. Not bad!

2011-03-02

Norge NGE101 and Google PowerMeter scripts.


I just uploaded the first (rough) version of the arduino code and scripts that I use for tracking my personal power usage.

The code can be found on github.

I don't expect anybody else to be able to use the code as is yet, but it should be a good starting point.

The code has only been tested on Linux (Ubuntu), and it will most likely require some changes before it's possible to build and run on Windows.

2011-02-14

Kenwood TH-D72



To celebrate that I got my HAM license back, I bought myself a new toy, a dual band 2m and 70cm Kenwood TH-D72.

I have not been a active HAM operator for almost 20 years, but besides the complexity of the toys :) not much seem to have changed.

Back then, I was playing around with ax.25 packet radio on 2m running at a whopping 1200 baud. Today, we are still using 1200 baud, but the Terminal Node Controller are now build into some high end handsets.

Here in Denmark, there are not much old-school packet radio activity, but here in Copenhagen,  I can hear lots of APRS traffic. APRS is build on top of the old ax.25 protocol, and is mostly used to report information about HAM's and there gadgets. All the APRS traffic is bounced around the network a bit, and the packages that manage to hit a internet gateway is then collected for anybody to study. One popular site that visualizes this traffic is APRS.fi.

In an age where we all are carrying internet capable phones, it might seem strange to be fascinated by something as silly as 1200 baud data connections with limited range, but I'm enthralled but the openness of most of it.

One thing that is not open, at all, is the radios we can buy, and even though we are allowed to build and use our own equipment, its hard to beat the size and features of the mass produces handsets that exists.

This is also true for the D72, it has a USB interface that's used for firmware updates, GPS positions, packet radio TNC and configuring the radio, but unfortunately Kenwood have not documented most of the capabilities of the USB interface. My messing around a bit, I managed to find some commands, but I'm sure there are many more.

I hope to find some time in the near future to see if I can't wrestle the firmware out of the D72 and take a closer look at what makes it tick, and discover some more commands, so that I can control the radio from linux.

2011-01-31

Adding data to a CouchDB from Arduino.

Sometimes when using the Arduino to collect data, I need to store this data somewhere. Usually I just send the data over the usb/serial link and then have a python script running on my computer that collects the data and stores it in a database.
But now that I have a ethernet shield for my arduino, I figured I would try to remove one step from that equation.

Some time ago, as part of my work, I looked at different NoSQL databases, and one database caught my eye. Even though it did not fit the problem I had then, CouchDB still intrigued me. I especially liked the cached views and build in map/reduce functionality.

Unlike most other databases, who uses proprietary binary protocols, CouchDB has a simple HTTP RESTful API. This makes it easy to talk to from the arduino.

Lets get started, first we need to initialize the ethernet board:

#include <WProgram.h>
#include <wiring.h>
#include <HardwareSerial.h>
#include <SPI.h>
#include <Ethernet.h>
#include <utility/socket.h>
#include <avr/eeprom.h>

static const uint8_t g_gateway[4] = {0, 0, 0, 0};
static const uint8_t g_subnet[4] = {255, 255, 255, 0};
static const uint8_t g_ip[4] = {192, 168, 1, 170}; // change me!
static const uint8_t g_mac[6] = {0x90, 0xa2, 0xda, 0x00, 0x25, 0x65};

void setup()
{
  Serial.begin(115200*8);
  Serial.println("GO");

  W5100.init();
  W5100.setMACAddress((uint8_t*)g_mac);
  W5100.setIPAddress((uint8_t*)g_ip);
  W5100.setGatewayIp((uint8_t*)g_gateway);
  W5100.setSubnetMask((uint8_t*)g_subnet);
}

I have chosen to stay away from the higher level ethernet api (Server/Client ...), and instead use the lower level W5100 and socket api. This is mainly to avoid the blocking nature of the Client class.


Unfortunately the WIZnet controller does not know its own MAC address, so we have to hardcode it into the source. You can use almost any made up MAC address, but if you have multiple ethernet shields, you must make sure all used MAC addresses are unique. If you look at the backside of the ethernet shield PCB, you will find that the Arduino people have been nice enough to print a unique MAC address on the board.

You will also need to change the ip address, at line 11, to one that matches your local network.

If you have multiple arduinos and ethernet shields, you might want to program a unique IP and MAC address into the arduinos eeprom. On my boards I use the bytes from 0x3f0. Here is a example how how to read the eeprom:
void setup()
{
  Serial.begin(115200*8);
  Serial.println("GO");

  uint8_t ip_mac[4+6];
  eeprom_read_block(ip_mac, (const void*)0x3f0, sizeof(ip_mac));
  if(ip_mac[0] == 255)
  {
    Serial.println("PANIC: missing IP address");
    for(;;) /**/ ;
  }

  W5100.init();
  W5100.setMACAddress(ip_mac+4);
  W5100.setIPAddress(ip_mac);
  W5100.setGatewayIp((uint8_t*)g_gateway);
  W5100.setSubnetMask((uint8_t*)g_subnet);
}


Next, we have the main loop. This example will read some analog and digital ports and then connect to the CouchDB to store the result in the database called "test1".

#define FD 0
#define DB_NAME "test1"
 
static const uint8_t g_couchAddr[4] = {192, 168, 1, 1};
static const uint16_t g_couchPort = 5984;
 
void loop()
{
  enum {STATE_IDLE, STATE_CONNECTING, STATE_CLOSE_WAIT} netstate = STATE_IDLE;
 
  uint32_t nextsampleat = millis();
  uint16_t lasta0;
  uint16_t lasta1;
  uint8_t lastd2;
  bool hassample = false;
 
  for(;;)
  {
    uint32_t now = millis();
    if(int32_t(now-nextsampleat)>=0)
    {
      Serial.println("sampling value");
      lasta0 = analogRead(0);
      lasta1 = analogRead(1);
      lastd2 = digitalRead(2);
      hassample = true;
      nextsampleat += 10*1000; // 10 secs;
    }
 
    uint8_t sockstatus = W5100.readSnSR(FD);
    switch(netstate)
    {
      case STATE_IDLE:
        if(hassample)
        {
          Serial.println("connecting");
          socket(FD, SnMR::TCP, 1100, 0);
          connect(FD, (uint8_t*)g_couchAddr, g_couchPort);
          netstate = STATE_CONNECTING;
        }
        break;
 
      case STATE_CONNECTING:
        if(sockstatus == SnSR::ESTABLISHED)
        {
          Serial.println("connected, sending doc");
 
          char doc[64];
          unsigned doclen = snprintf_P(doc, sizeof(doc),
            PSTR("{\"a0\":%u, \"a1\":%u, \"d2\":%u}"), lasta0, lasta1, lastd2);
 
          char header[64];
          unsigned headerlen = snprintf_P(header, sizeof(header),
            PSTR("POST /" DB_NAME "/ HTTP/1.0\r\n"));
          send(FD, (const uint8_t*)header, headerlen);
          headerlen = snprintf_P(header, sizeof(header),
            PSTR("Content-Type: application/json\r\n"));
          send(FD, (const uint8_t*)header, headerlen);
          headerlen = snprintf_P(header, sizeof(header),
            PSTR("Content-Length: %u\r\n\r\n"), doclen);
          send(FD, (const uint8_t*)header, headerlen);
 
          send(FD, (const uint8_t*)doc, doclen);
 
          netstate = STATE_CLOSE_WAIT;
        }
        else if(sockstatus == SnSR::CLOSED)
        {
          Serial.println("conection failed");
          hassample = false;
          netstate = STATE_IDLE;
        }
        break;
 
      case STATE_CLOSE_WAIT:
        if(sockstatus == SnSR::CLOSE_WAIT || sockstatus == SnSR::CLOSED)
        {
          Serial.println("conection closed");
          // ignoring http reply since we can't deal with db errors anyway.
          close(FD);
          hassample = false;
          netstate = STATE_IDLE;
        }
        break;
    }
  }
}

Here is what the code does:

  • line 29+30: the address and port of the CouchDB. You most likely need to change this to match your computers address.
  • line 44-53: every 10 seconds some hardware ports are read, and the result stored away to later transmission to the db.
  • line 55-110: the async network state machine.
  • line 59-65: when a new sample is ready, a connection to the database is opened.
  • line 92-97: if we fail to connect to the database, the sample is thrown away, and the connection will be retried when a new sample is ready.
  • line 73-75: the CouchDB document is created.
  • line 77-86: creation and sending of the HTTP header.
  • line 88: here the CouchDB document is sent to the server.
  • line 92-109: the connection is closed, and we wait for acknowledgment.


When running the code, CouchDB will start to contain documents much like this:
{
  "_id": "135cfbc9bc4709ba24e4d84b5006ae91",
  "_rev": "1-43e825eaf711e9c7b4aaf4274813d677",
  "a0": 484,
  "a1": 1023,
  "d2": 0
}

Looks nice and all, but for my project I also need to know when each sample were made. One option is to add a DS1307 real time clock, but since I'm more of a software guy, I choose to let CouchDB add the timestamp.

To insert a timestamp into the document, we use a small javascript snippet called a update hander. A update handler can also add a new document to the database, so we can add and update the document in one database call.

This is the update handler used for my samples, it will create a new document, add a posix timestamp and the 3 samples.
{
  "updates": {
    "new": "function(doc, req) {
      return [{
        _id:req.uuid, 
        time:(new Date()).getTime()/1000.0, 
        a0:Number(req.form.a0), a1:Number(req.form.a1), d2:Number(req.form.d2)
      }, \"updated\"];
    }"
  }
}
To upload this design document to the database, save it as test1_design.js and use curl:
curl -X PUT http://duff:5984/test1/_design/test -d @test1_design.js


The arduino code must be change slightly to use this new update handler.

Since we are no longer adding the document directly to the database, but calling the update handler, the post URL must be changed to:
#define DB_NAME "/test1/_design/test/_update/new"

We also need to change the HTTP post data from a JSON document to HTTP post arguments:
unsigned doclen = snprintf_P(doc, sizeof(doc), 
  PSTR("a0=%u;a1=%u;d2=%u"), lasta0, lasta1, lastd2);


And at last the HTTP mime type needs to be changed:
headerlen = snprintf_P(header, sizeof(header), 
  PSTR("Content-Type: application/x-www-form-urlencoded\r\n"));


After running this new version, all new documents will have a timestamp:
{
  "_id": "135cfbc9bc4709ba24e4d84b500ac935",
  "_rev": "1-9d17bbf14ccaac2435deaba4bc1b9ead",
  "time": 1296423519.513,
  "a0": 549,
  "a1": 1023,
  "d2": 0
}


All we need now is a CouchApp to render graphs of the collected data, but that is beyond my current javascript capabilities :)

2011-01-25

NGE101 – Norgo wireless energy meter (part 6)

In the previous posts I showed the bits in the captured frames in the order they were captured and decoded, ie. the first received bit were shown to the left.
When I managed to decode the main payload in the frames, I learned that the frames are send with the least significant bit (lsb) first, so I was actually showing the captured frames in "reverse". This ended up being a bit confusing since binary number are normally shows with the msb at the left, and lsb at the right, so in this post I will show the frames reverse compared to the previous posts, so what looked like:

preamble/header      payload                              checksum
11111010100011001001 110010111110011111010000000000000000 110100111110010
11111010100011001001 000101111110011111010000000000000000 000111100001010
11111010000011001001 00100111111100000000                 111000010110010
11111010000011001001 01100111111100000000                 111001010010000

will now look like this:

checksum        payload                              header/preamble
010011111001011 000000000000000010111110011111010011 10010011000101011111
010100001111000 000000000000000010111110011111101000 10010011000101011111
010011010000111                 00000000111111100100 10010011000001011111
000010010100111                 00000000111111100110 10010011000001011111

The first thing I wanted to figure out, was how the checksum should be calculated, so I started by capturing and looking at thousands of frames to see if I could find a pattern.

Here are a few frame pairs where just the lowest bit in the payload are different:

111010001010001 00000000110010001010 10010011000001011111
110010101000001 00000000110010001011 10010011000001011111

110000011001000 00000000110100010010 10010011000001011111
111000111011000 00000000110100010011 10010011000001011111

010010010001000 00000000110100010110 10010011000001011111
011010110011000 00000000110100010111 10010011000001011111

If you look closely at the checksum, you might notice that the xor of the two checksums in each pair is the same value: 001000100010000.

More frames with just one flipped bit in the payload:

  111111101010010 00000000111110111001 10010011000001011111
^ 101110101110010 00000000111110111011 10010011000001011111
= 010001000100000

  110100001000101 00000000111111001000 10010011000001011111
^ 100101001100101 00000000111111001010 10010011000001011111
= 010001000100000

  100010000111000 00000001000000000000 10010011000001011111
^ 000000001111000 00000001000000000100 10010011000001011111
= 100010001000000

  000110110111001 00000001000000011000 10010011000001011111
^ 100100111111001 00000001000000011100 10010011000001011111
= 100010001000000

After some time I managed to find these patterns:

111110000000001 - payload bit 12
011011010000000 - payload bit 11
001101101000000 - payload bit 10
110110100100000 - payload bit 9
001011000010000 - payload bit 8
100101100001000 - payload bit 7
100010100000100 - payload bit 6
000001000000010 - payload bit 5
100000100000001 - payload bit 4
000100010000000 - payload bit 3
100010001000000 - payload bit 2
010001000100000 - payload bit 1
001000100010000 - payload bit 0
110100000001000 - header bit n
111010000000100 - header bit n-1
011101000000010 - header bit n-2

To me, it looked very much like these xor patterns were generated by a "many-to-many" linear feedback shift register (lfsr), but no matter now much I tried, I could not figure it out.

So to crack the lsfr, I wrote a c++ program that used a genetic algorithm to find the lsfr tabs:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <algorithm>
 
typedef unsigned int uint;
 
uint patterns[] = {
  0b111110000000001, 0b011011010000000, 0b001101101000000, 0b110110100100000,
  0b001011000010000, 0b100101100001000, 0b100010100000100, 0b000001000000010,
  0b100000100000001, 0b000100010000000, 0b100010001000000, 0b010001000100000,
  0b001000100010000, 0b110100000001000, 0b111010000000100, 0b011101000000010,
};
static const uint pattern_cnt = (sizeof(patterns)/sizeof(patterns[0]));
 
uint calcErrors(uint mask[15])
{
  unsigned errors = 0;
  for(unsigned i=0; i<pattern_cnt-1; i++)
  {
    unsigned ipattern = patterns[i];
    unsigned opattern = patterns[i+1];
    unsigned pattern = ipattern;
    pattern = (pattern>>1);
    for(int biti=0; biti<15; biti++)
    {
      if(ipattern&(1<<biti))
        pattern ^= mask[biti];
    }
    errors += __builtin_popcount(pattern^opattern);
  }
 
  return errors;
}
 
struct genome
{
  uint mask[15];
  uint errors;
  bool operator<(const genome &other) const { return errors<other.errors; }
};
 
static const uint POP_SIZE = 10000;
 
int main()
{
  srand(time(NULL));
 
  static genome population[POP_SIZE];
  for(uint i=0; i<POP_SIZE; i++)
    for(uint j=0; j<15; j++)
      population[i].mask[j] = rand()&0x7fff;
 
  for(uint generation=0; generation<1000000; generation++)
  {
    for(uint i=0; i<POP_SIZE; i++)
      population[i].errors = calcErrors(population[i].mask);
 
    // sort genomes, so that the most fit will be in the beginning of the array
    std::sort(population, population+POP_SIZE);
 
    if(population[0].errors == 0)
    {
      for(uint j=0; j<15; j++) printf("%04x ", population[0].mask[j]);
      printf("\n");
      break;
    }
 
    static genome newpopulation[POP_SIZE];
    for(uint i=0; i<POP_SIZE; i++)
    {
      uint parent1 = pow(double(rand())/RAND_MAX, 3.0)*POP_SIZE;
      uint parent2 = pow(double(rand())/RAND_MAX, 3.0)*POP_SIZE;
 
      // produce offspring:
      memcpy(&newpopulation[i], &population[parent1], sizeof(newpopulation[i]));
      for(uint j=0; j<15; j++)
        if(rand()<RAND_MAX/2)
          newpopulation[i].mask[j] = population[parent2].mask[j];
 
      // mutate offspring
      if(rand()<RAND_MAX/10)
      {
        uint bit = rand()%15;
        uint mask = rand();
        for(int j=0; j<15; j++)
          if((mask>>j)&1)
            newpopulation[i].mask[j] ^= 1<<(bit%15);
      }
    }
    memcpy(population, newpopulation, sizeof(population));
  }
   return 0;
}

It only took the program around 15 seconds to find the correct "taps": 0x4880 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x2080 0x4000 0x4000 0x4000 0x4000 0x4000 0x4000.

To verify that I now had enough information to verify the checksum of captured frames, I wrote another small python script that calculates the checksum for a frame, and checks itagains the checksum in the captured frame:

checksum_taps = (0x4880, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
  0x2080, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000)
 
def next_mask(mask):
  next_mask = mask>>1
  for i in range(15):
    if mask&(1<<i):
      next_mask ^= checksum_taps[i]
  return next_mask
 
def verify(checksum, data, datalen):
  mask = 0x0001
  cchecksum = 0
  for i in range(datalen-1, 7, -1):
    mask = next_mask(mask)
    if (data>>i)&1:
      cchecksum ^= mask
  assert checksum == cchecksum
 
#        |checksum-----|    |payload---------------------------||header----||preamb|
verify(0b010011111001011, 0b00000000000000001011111001111101001110010011000101011111, 36+20)
verify(0b010100001111000, 0b00000000000000001011111001111110100010010011000101011111, 36+20)
verify(0b010011010000111,                 0b0000000011111110010010010011000001011111, 20+20)verify(0b000010010100111,                 0b0000000011111110011010010011000001011111, 20+20

By trial and error, I also managed to find out that the preamble is 8 bits long, and not included in the checksum, so the packages now look like this:

checksum        payload                              header       preamble
010011111001011 000000000000000010111110011111010011 100100110001 01011111
010100001111000 000000000000000010111110011111101000 100100110001 01011111
010011010000111                 00000000111111100100 100100110000 01011111
000010010100111                 00000000111111100110 100100110000 01011111

Now that I can generate frames with correct checksums, I can start sending my own frames to the NG101 receiver and see how it reacts when the different bits are set. This will make it easier to discover what all the bits the the header and payload are used for.
But that will have to wait for another day :)