Newsrooms vs. the Volcano

Over in Geneva, the EBU Radio News Conference 2010 is underway, and I’m watching from afar via the wonders of Twitter.

Late yesterday, Michael Good of RTE talked about how they covered the Eyjafjallajökull eruption and, finding that the “public wanted more than radio programmes could give”, had to turn to the web and networked journalism to improve coverage. Charlie Beckett reports:

In the final session it was made clear by speakers such as Michael Good of RTE that mainstream media can’t cope with big complex crisis stories such as the volcanic ash story: ‘ the public wanted more than radio programmes could give’

RTE responded by using social media connected to their coverage to fill the gaps and to tell the micro as well as the macro story. To provide context as well as drama, information as well as narrative. As Michael put it, it showed how social media has to be at the heart of the newsroom.

Brett Spencer also reported that “SWR say if it happened again right now they would approach the science and the experts with more caution” and “Richard Clark of the BBC Newsroom says an awful lot of experts got airtime who actually didn’t know very much.”

As someone who followed Eyjafjallajökull’s progress from the beginning of the first ‘tourist eruption’ right the way through to the final gasps of the phreatomagmatic eruption (i.e. the big explosive bit), I can say with some certainty that the mainstream media did a pretty appalling job of choosing experts to talk about the eruption. Often, they chose to speak to industry representatives, such as union leaders or airline owners, who knew very little about the eruption itself but had very strong views on what they thought reality ought to be. They also had a vested interest in portraying the situation in a particular light.

I was particularly disgusted by people like Richard Branson, who threw a strop because he thought the flight ban was unnecessary. The BBC reported Branson being either disingenuous or dangerously ignorant:

Virgin Group chairman Sir Richard Branson meanwhile told the BBC that he believed governments would be unlikely to impose a blanket ban again.

“I think if they’d sent up planes immediately to see whether the ash was actually too dangerous to fly through or to look for corridors where it wasn’t very thick, I think that we would have been back flying a lot sooner,” he said.

This fundamentally misrepresents the monitoring that was going on at the time (planes were being sent up to look at the ash cloud) and, more importantly, fundamentally misunderstands the nature of ash clouds. They are not a uniform blanket of ash floating through the air, but a constantly changing area of high and low ash densities: Any ‘corridor’ there today probably wouldn’t be there tomorrow.

But in the scramble for experts, no one flubbed quite as badly as the Wall Street Journal and CNN, who both featured Robert “R.B.” Trombley, a self-styled volcanologist who turned out to be not quite the expert they had assumed.

Going back to #RNews10, Charlie Beckett said, “Yes the volcano exposed limits of MSM & value of social media bt it also exposed lack of data transparency from airlines, govt etc” to which Mike Mullane replied, “Beckett: Don’t beat yourselves up. There was failure on the part of governments and meteorologists to provide data for journalists”. And, in a related point, Andy Carvin Tweeted, “Don’t think anyone mentioned maps, though, whether newsroom generated, user-generated or both. Were there any?”

Mike and Charlie’s assertions are only true for the UK and the air travel industry: The airlines were, unsurprisingly, entirely opaque. The UK Met Office had some data, particularly on ash measurement and predictions, but could have done a much better job of communicating what they were doing and providing data. That’s a problem they seriously need to fix: They opened themselves up to undeserved criticism because no one had any idea what they were actually doing. The Civil Aviation Authority and the National Air Traffic Services should also be soundly criticised for appalling communications as well. Their online information and data was not well organised, to say the least.

But there was a huge amount of data coming out of other sources, particularly the Icelandic Met Office, which the mainstream media completely ignored. The IMO was providing near-live earthquake data for the Mýrdalsjökull area, which includes Eyjafjallajökull icecap, available as a map or a data table. And, as I discovered when I did this myself, if you sent them a nice email they would send you the raw data to play with. There is no reason why the media could not have contacted the IMO and used some of this data in visualisations for their coverage, like this one done by DataMarket.com:

There was quite a lot of ash forecast data coming out of various different institutes, primarily the UK Met Office. There were videos (search for Eyjafjallajökull) and photos taken by scientists, tourists, locals and the Icelandic news organisations (whose coverage was obviously much better). There were multiple live webcams and volcano enthusiasts captured and shared webcam timelapses showing the eruption and jökulhlaups (flash floods of ash and meltwater) on a daily basis. There was even a cut-out-and-keep model of the volcano, made by the British Geological Society.

And there was some flight data available, as exemplified by this fabulous timelapse of the European flights resuming after the ban:

The problem was that most news journalists, obviously, do not have the kind of specialist knowledge to be able to assess sources, experts, or data for an event that is so far outside of their usual field of experience. I understand that journalists can’t be experts in everything, but I do expect them to know how to find information, find sources, and to find data, and to do so reliably.

But they seemed oblivious to the online communities that were following this eruption closely and where there were people who could have helped them. I spent a lot of time on Erik Klemetti‘s wonderful blog, Eruptions (new site, old site). Erik, a vulcanologist at Denison University in Ohio, played host to a community of scientists and amateurs who discussed developments in detail and answered questions that people had about how all this volcano stuff really works.

I was on that blog almost every day, I don’t remember a single journalist ever asking in the comments for help in finding information or understanding its implications. I do remember, however, a lot of people popping in to get clarification on the misinformation promulgated by the media, particularly rumours that Eyjafjallajökull’s neighbour, Katla, was about to erupt.

The truth is that Eyjafjallajökull was probably the best observed, monitored and recorded eruption in history. The sheer volume of data produced was enormous. And the mainstream media ignored everthing but the pretty pictures.

A comment on comments

In July last year, I gave a lunchtime talk to the BBC World Service about the meaning of ‘social’ online, the problems that we face with commenting on news sites, and the way I thought we need to consider social functionality design in the news arena.

I opened with a couple of videos: The infamous Mitchell and Webb “What do you reckon?” sketch that has served both Kevin and I so well in our presentations, and a Sky News ident promoting their discussion forums.

My point was that, since the earliest days, news websites have seen interactive parts of their sites, like comments or forums, as a place for a damn good punch-up. And those who thought that they were providing a valuable place for feedback and discussion found that they had actually created toxic environments. I probably (although I don’t remember) mentioned Comment is Free as the archetypal pit of vipers. I usually do.

I went on to discuss the core concepts of social objects, relationships, trust and privacy, and had a stab at attacking one of the core misunderstandings the media has about community: Your audience is not a community.

After attempting to run through what these concepts mean, and how they affect social website design, I went on to emphasise why this is important. From my notes at the time:

Bad community reflects badly on your brand.

A community of fringe voices is alienating and unconstructive, and opens your brand up to ridicule.

I closed with the point that designing for social interaction is not just a matter of slapping comments on everything, but requires forethought and a deep understanding of the nature of ‘social’.

The first question was asked by Peter Horrocks, the Director of BBC World Service. He asked if I could give them examples of any news organisation had done it properly. I replied that, as far as I was aware, no news organisation had taken the necessary steps to create social functionality worthy of note.

The first parts of news sites to get comments were the early blogs, many of them run on Typepad or Movable Type, which was by far and away the best platform at the time. This was before WordPress and before specialist commenting systems, so dealing with spam and moderating comments could be arduous, but most blogs had niche audiences who tend to behave better, partly because they actually get to know one another.

Then other parts of the news organisations heard they siren call of the comment, and before you knew it, they were everywhere. You could leave a comment on almost every news story you stumbled upon, regardless of whether commenting was appropriate. Stories of murders and rapes and disasters asked you, “What do you reckon?”, and people reckoned away.

I have never seen any evidence that news organisations take the problem of community seriously enough. For them, the more comments a piece got, the more page views, the higher they can push their ad rates. So long as nothing was libellous, hey, go for it.

Kevin has said that most news orgs don’t have an engagement strategy, they have an enragement strategy. Community strategies have been focused more on how to keep moderation costs down whilst increasing comments, rather than going back to first principles and figuring out what comments are really for, understanding people’s behaviour in comment areas, and then designing a tool which helps facilitate positive behaviours and reduce the potency of negative ones.

In the half-decade since news organisations have discovered commenting, they have failed to fully understand it and to modify their systems appropriately.

Now Reuters has finally taken a step in the right direction by adding a rating system that awards points for good comments and then, eventually, allows the user to earn extra privileges (which they can also lose through bad behaviour). They have also added profile pages which aggregate comments and provides a count of how many have been accepted, removed or reported for abuse.

That is a good start, but it is just a start. It will be interesting to see what effect their basic rating system will have. Whenever one is rewarding a behaviour, one has to think about how that reward system can be gamed and what unintended consequences might result.

In this case, I can see how a user might put a lot of effort into building up a large stash of points through adding a lot of easy, unobjectionable content in order to get to a VIP user status which they can then abuse. Yes, they’ll be punished for that abuse but not until some of their abusive comments have been published straight to the web.

Why would someone go to all that trouble? On the web, no reason is required other than “Because I can”.

Reuters’ system may help slow down the toxicity of news site comments, but it isn’t the full Monty. It doesn’t address how people might come to form positive relationships via their site. It doesn’t consider how trust ? between readers (or readers and journalists) may develop or be eroded. It doesn’t think about the social objects around which people may want to interact (hint: the story is not the atomic unit of news). It doesn’t do anything to develop a true community.

On privacy, at least, it is neutral. Contrary to the position of one commenter on Baum’s blog post, if you post lots of stuff in public, having that stuff aggregated into one spot is not an invasion of your privacy and is not speech-chilling. If you are ashamed of what your comments collected say about you, perhaps you ought to think a bit more about what you say.

So, Reuters get a point for trying, but which news organisation is going to really grasp the nettle and do interaction properly?

The social side of citizen science

I spent last Thursday and Friday at the Citizen Cyberscience Summit, listening to a series of presentations about how the public are collaborating with scientists to achieve together what neither group can do alone. It was a fascinating couple of days which illustrated the vast variety of projects either running currently or in the pipeline. We’ve all heard of SETI@home, but there are projects now across a diverse set of disciplines, from botany to history, astronomy, meteorology, particle physics, seismology and beyond.

What was notable, however, was that the majority of the projects were about volunteers donating CPU cycles rather than brain cycles. Where communities were mentioned it was generally in passing, and when community tools were mentioned they were almost invariably forums/bulletin boards.

I had hoped to here more from the different projects about community churn, retention tactics, development tactics, social tools, and other such things, but was not totally surprised to see that most presentations focused on the science instead. There was a discussion session scheduled for Friday evening to talk some of these issues through, but I sadly couldn’t stay for it. Nevertheless, I think that the social and community aspects should have been discussed throughout the two days.

It is obvious that there is tremendous overlap of interests between the citizen science community and the social collaboration community, and there are lessons both parties could learn from each other. I’d love to see some sort of round-table organised that brought the two communities together to discuss some of the issues that citizen science faces. In lieu of that, here are a few ideas to hopefully get an online discussion going.

The forum is not the only tool
I don’t think it’s a surprise that those projects which do have a community component tend towards having a forum of some sort. They’ve been around for ages and for many people they are the default discussion tool. However, we’ve come a long way since the forum was invented and there are many social tools that are more suited to certain types of tasks.

Wikis, for example, are much better for collecting static (or slowly evolving) information such as help pages. Blogs are good for ongoing updates and discussion around them. UserVoice is great for gathering feedback on your website or software. A community is a multi-faceted thing so often needs more than just one tool.

Facebook is not a panacea 
During lunch on Friday I did get to talk to some of the other attendees about social media. Facebook, of course, came up. Whilst Facebook is a massive social network, one has to be very careful how one uses otherwise it can be a massive waste of time. Facebook Causes, for example, was said by the Washington Post to have raised money for only a tiny percentage of the nonprofits that used it. I myself have seen how Facebook encourages ‘clicktivisim’ – the aimless joining of a group or cause that isn’t followed up by any meaningful action.

Facebook as a platform, however, is a more interesting proposition. Facebook Connect allows users to log in to your site using Facebook and lets your site post updates for the user to their wall. And Facebook apps may allow citizen science to be done actually on Facebook rather than requiring users to go to another site. In this way, Facebook shows promise, but starting a group or a page and hoping that people will just go off and recruit users to your project is unlikely to be successful.

Twitter is a network of networks
Where Facebook is sitting in the kitchen being introspective over a can of cider, Twitter is the extrovert at the party. Although Facebook has more users (~500m), Twitter is now at ~150m users and growing at 300k per day. More to the point, however, Twitter is easy to use, more open, and Tweets that go viral really do go viral because it’s not just your network you’re reaching, but a network of networks. The potential value for recruitment and retention is huge, if you do it right.

Design apps to be social from the beginning
If you’re creating software for users to download and run, think about how you could make that social. The social aspects to your project don’t need to be managed exclusively on a separate website or third party software. If it makes sense for what you are doing, build in sociability.

Most of these tools are free
I’m guessing that most citizen science projects have little funding. Where social media is concerned, the good news is that the vast majority of key tools are free. The not-so-good news is that you do need to understand how to use them, which could take some investment in terms of training and consulting, and you need time to maintain your online presence. A good consultant will help you understand how to work social media into your work life so that it doesn’t become a drain on resources, but you must have some time to commit to it.

This is where JISC and other funding bodies could really help: by allocating specific funds to raising awareness of social tools in the science community, providing training, ensuring that projects can afford to work with outside social media consultants, and even by helping project leaders understand how to find a good social media consultant (sadly, there are lots of carpetbaggers).

The opportunity afforded to citizen science by social media is enormous, regardless of whether a project is focused on CPU time or more human-scale tasks. Now let’s start talking about how to realise that potential!

Janos Barbero, The challenge of scientific discovery games

FoltIt is a protein folding video game. Proteins are chains of amino acids, and they form a unique 3D structure which is key to their function.

Distributed computing isn’t enough to understand protein structures. Game where you try to fold the protein yourself. Game design is difficult, but even more difficult when constrained by the scientific problem you are trying to solving. You can’t take out the fiddly bits. But players have to stay engaged.

Approach the game development as science. Collect data on how people progress through the game so that they could change the training so that they found it easier to do the difficult bits. Also use that info to improve the tools. Had a lot of interaction with and feedback from the players.

Also analyse how people use the different game-tools to do the folding, and see two in particular were used consistently by successful players.

Emergence of game community. Seeing people getting engaged. Had a fairly broad appeal, demographics similar to World of Warcraft.

Second milestone was when players started beating the biochemists, emergence of ‘protein savants’, had great intuition about the proteins, but couldn’t always explain it.

Have a game wiki so people can share their game playing strategy. Each player has a different approach, can use different game-tools. People develop different strategies for different stages of the game.

Humans are comparable or better than computers at this task.

Multiplayer game, they form groups or clans which self-organise, many groups have people who focus on the first phase, others focus on the endgame.

Effect of competition, as one person makes progress, others try to keep up.

Users can share solutions, network amplification.

Humans have completely different strategy to computers, can make huge leaps computers can’t, often looking at bad structures that lead to good, which a computers can’t.

FoldIt is just the first step in getting games to do scientific work. Problem solving and learning through game play. Looking to find ways to train people into experts, generalise to all spatial problems, streamline game synthesis for all problems, and create policies, algorithms or protocols from player strategies.

Expand from problem solving to creativity. Potential for drug design, nano machines and nano design, molecular design. Aim is to create novel protein/enzyme/drug/vaccine that wouldn’t be seen in nature.

Also want to integrate games into the scientific process. Design cycle: pose problem, get public interest, run puzzle, evaluate-analyse-modify-run-repeat, publish.

Elizabeth Cochran, Distributed Sensing: using volunteer computing to monitor earthquakes around the world

Quake-Catcher Network: Using distributed sensors to record earthquakes, to put that data into existing regional seismic networks.

Aim: To better understand earthquakes and mitigate seismic risk by increasing density of seismic observations.

Uses new low-cost sensors that measure acceleration, so can see how much ground shakes during earthquakes. Using BOINC platform. Need volunteers to run sensors, or laptop with sensors.

Why do we need this extra seismic data. Need an idea of what the seismic risk is in an area, look at the major fault systems, population density, and type of buildings.

Where are the faults? Want the sensors in places where earthquakes occur. GSHAP map, shows areas of high seismic risk near plate boundaries. Most concerned with population centres, want sensors where people are, so can get community involved. Looking at cities of over 1m people in areas of high seismic risk.

Construction standards in some areas mean buildings can withstand shaking. But two very large earthquakes took place this year e.g.: Haiti was a bit problem because they have infrequent earthquakes and very low building standards. Chile, had relatively few deaths, and even though some damage, the buildings remained standing.

Seismic risk, look at what happens in the earthquake fault. Simulation of San Andreas fault, shows how much slip, a lot of complexity in a rupture. Very high amplitude in LA basin because it’s very soft sediment which shakes a lot.

Need to figure out how buildings respond. Built 7 storey building on a shake table and shook it, with sensors in and recorded what happened to it. Shake table can’t replicate real earthquakes perfectly. Also have many different types of structure so hard to get the data for them all.

Instead, use sophisticated modelling to understand what happens along the fault, propagation, and building reaction.

Simulations now much more detailed than observed, so no way to check them.

Need to add additional sensors. Seismic stations run upwards of $100k dollars each. Can’t get millions of dollars to put up a sensor net.

Instead use accelerometers that are in laptops, e.g. Apple, ThinkPad, which are used to park hard drive when you drop them. Can tap into that with software in the background to monitor acceleration. Can record if laptop falls off desk or if there’s an earthquake.

External sensors can be plugged into any computer, cost $30 – $100 each, so inexpensive to put into schools, homes etc. Attached by USB.

Challenges:

Location, if you have a laptop you move about, so need laptop by IP, but user can also input their location which is more exact than IP. And user can enter multiple locations, e.g. work, home.

Timing, there’s no GPS clock in most computers, and want to know exaclty when a particular seismic wave arrives, so do network time protocol and pings to find the right time.

Noise, get much more noise in the data than a traditional sensor, e.g. laptop bouncing on a lap. Look at clusters. If one laptop falls on a floor, they can ignore it, but if waves of laptops shake and the waves move at the right speed, they have an event.

Have 1400 participants globally, now trying to intensify network in certain places, e.g. Los Angeles.

Use information for detection of earthquakes, then look at some higher order problems, e.g. earthquake source, wave propagation.

Had one single user in Chile at the time of the earthquake. Software looked at current sensor record and sees if it’s different to previous. Info sent to server after 7 seconds. Soon after earthquake started, internet and power went out, but they did get the date later.

Took new sensors to Chile and distributed them around the area. Put up a webpage asking for volunteers in Chile and got 700 in a week. Had more volunteers than sensors. Had 100 stations installed.

There were many aftershocks, up to M6.7. Don’t often have access to a place with lots of earthquakes happening all at once, so could test data. Looked for aftershock locations, could get them very quickly. Useful for emergency response.

Had stations in the region and some had twice as much shaking as others, gives idea of ground shaking.

Want to have instruments in downtown LA. Have a high-res network in LA already but station density not high enough to look at wave propagation. If put stations in schools, then can get a good network that will show structure of LA basin.

Will also improve understanding of building responses. You can look at dominant frequency that a building shakes at, if that changes then the building has been damaged.

Want to make an earthquake early warning system. An earthquake starts at a given location and the waves propagate out. If you have a station that quickly record the first shaking, and you can get an location and magnitude from that, then because seismic waves travel slower than internet traffic you can get a warning to places further away. More sensors you have, the quicker you can get the warning out.

Working with Southern Californica quake network to see if they can integrate two sensor networks. Also working with Mexico City to install stations, as currently only have a few stations. If any one of them goes down, it affect their ability to respond.

Matt Blumberg, Society of Minds – a framework for distributed thinking

GridRepublic, trying to raise awareness of volunteer computing. Provide people with a list of BOINC projects, can manage all your projects in one website.

Progress Thru Processors, trying to reach people in Facebook. Join up, one click process, projects post updates to hopefully reach volunteers’ friends.

Distributed thinking – what can be done if you draw on the intellectual resources of your network instead of just CPUs. How would you have to organise to make use of available cognition.

What is thinking? Marvin Minksky, The Society of Mind, “minds are built from mindless stuff’. Thinking is made up of small processes called agents, intelligence is an emergent quality. Put those agents into a structure in order to get something useful out of them.

Set of primitives

  • Pattern matching/difference identification
  • Categorising/Tagging/Naming
  • Sorting
  • Remembering
  • Observing
  • Questioning
  • Simulating/Predicting
  • Optimising
  • Making analogies
  • Acquiring new processes

Another way of thinking about it, linked stochastic processes, try stuff randomly, then explore those approaches that seem to be giving better results.

 

 

 

Philip Brohan, Volunteer online transcription of historical climate records

Interested in observation, and particularly extreme weather such as torrential rain, storms.

Morning of 16 Oct 1987, Great Storm in SE England, have weather records for that day, coloured by pressure. Low pressure – storminess. Can we understand its dynamics, can we predict it? Take observations and model them.

Previous big storm was 1703, so if we’re interested in climatology of storms, we need 100s years of records, and need them for everywhere in the world. Europe is well represented, but, say, Antarctica is not. Even in 1987, we didn’t have good records for there.

1918, rather badly observed period of time. People were distracted from weather observations (!).

This is the problem we’re trying to solve. We need more weather observations from 1918. Easy part of the problem: Public Records Office has a tremendous amount of info in their archive. Weather data potentially available if we can extract data.

Ship’s log of HMS Invincible, covers 1914 – 1915. Records actions each hour, and takes weather observations every 4 hours, six per day. Full weather obs. World’s collective archives have millions of these observations, and they are tremendously useful.

Started photographing the logbooks, 250k images. Tried OCR, doesn’t work. Using citizen science project to solve this problem.

Working with the people at Zooniverse, collaborating with them for 5 months. Funded by JISC.

At the moment developing the systems, Old Weather won’t be live for another month. You can pick a ship, join the crew of that ship and start to extract that information from its logbook: date, location, weather information. Doing some beta tests at the moment, hope that in a few weeks time it’ll be launched as a real project.

There is other information in these log books that might be of more interest to others. Expecting to find a lot of this sort of data, e.g. Invincible on 8 Dec 1914, at 5am it was engaged with Battle of the Falkland Islands.

Mostly, don’t know what is in these log books, so need to find out.

[I’m personally very excited about this project as I’m working with the Zooniverse chaps on a small part of it, so very please to see it’s close to launch!]

Mark Hedges, Sustaining archives

Archives, physical or digital. All sorts of documents, but many are important to historians, e.g. scraps of paper from early days of computing can be very important later on.

Time consuming to find things. Dangers to sustainability – stuff gets lost, thrown away, destroyed by accident or fire.

Digital archives, easier to access, but often funding runs out and we need them to last.

NOF-Digitise programme, ran for 5 years, ended 6 years ago, awarded £50m to 155 projects. What happened to them?

  • 30 websites still exist and have been enhanced since
  • 10 absorbed into larger archives
  • 83 websites exist but haven’t changed in 6 years since project ceased
  • 31 no URL available or doesn’t work.

Arhives can die

  • Server failes/vanishes
  • Available but unchanged, becomes obsolescent
  • Content obsolete, new material not included
  • Inadequate metadata
  • Hidden archives, stuff’s there but no one can find it
  • Isolated (from the web of data)

Can we involve the community? Most archives have a focus, so there may be a community interested in it.

Can exploit the interest of specific groups for specific archives, e.g. Flickr tagging of photos. But this can be too libertarian, open to misuse. Not appropriate for more formal archives, e.g. tagging often too loose.

Middle way between professional cataloguers on one hand, free tagging on the other.

Split work up into self-contained tasks that can be sent to volunteers to be performed over internet. Problem with free tagging is that it’s insufficiently accurate. Use task replication to get consensus, calibration of performance, etc.

Apply this methodology to digital archives and cultural heritage collections. Want to sustain and enhance the archives. Want specific communities to adopt archives to ensure longer term prospects.

Very early stage project, TELDAP, rich archive of material relating to Chinese and Taiwanese cultural material. But doesn’t have high visibility worldwide. Needs metadata enhancement, etc.

Great Ormond St Hosp historic case notes, e.g. Dr Garrad, chronological view of his case notes. Transcription, mark up key ideas, cross referencing. Specialised knowledge required, so community is retired nurses, doctors, etc.

East London Theatre Archive Project, contains digitised material from playbills, photos, posters. Images have metadata, but there’s a lot of textual information which hasn’t been extracted and isn’t therefore accessible.

Experimenting with variety of tasks: transcription; identification of ‘special’ text,e.g. cast lists which could be linked to list of actors, or play type.

Some images have text but it’s quite complexly arranged in columns, sections, with embedded pictures. So not entirely easy. Would be useful is to divide images into their different section and classify them according to their nature.

Hybrid approach, OCR them first to produce rough draft, then get volunteer contributions rather than starting with original image.

Ephemeral materials produce very important information.

Communities. Different communities: people with intrinsic interest in topic, e.g. academic, professional; local social communities, e.g. schools; history groups, genealogists; international pool of potential volunteers with E London ancestors.

Size of community less important than having an interest in a particular topic. Important to identify people who have an interest in the fate of the archive. Small groups.

Issues to address. Open-endedness of the tasks makes it hard to asses how well it’s going. Can also attract people with malicious intent.

Want to develop guidelines for this sort of community building.

How are volunteer outputs integrated with professional outputs? Resistance from professionals to anyone else doing stuff.

Having volunteer thinkers as a stage in the project, one could have more complex processes, after the volunteers have done stuff, can get pros in to do more specialised XML mark-up, so have a ‘production line’ to make best use of everyone’s skills.

Getting communities to participate in related archives might help people preserve their cultural identity in an increasingly globalised world.

David Aanensen, EpiCollect – a generic framework for open data collection using smartphones

Looks at a number of projects, including spatialepidemiology.net which tracks MRSA spread, and Bd-Maps which looks at amphibian health.

Have been developing a smartphone app so that people in the field can add data. Use GPS so location aware, can take in stills/video.

EpiCollect, can submit info and access data others have submitted, and do data filtering. Android and iPhone versions. Very generic method, any questionnaires could be used for any subject.

Fully generic version at EpiCollect.net. Anyone can create a project, design a form for data collection, load the project up, go out and collect data, and then have a mapping interface on website that you can filter by variable. Free and open source, code on Google Code. Use Gmail authentication.

Drag and drop interface to create form from text input, long text, select single option, select multiple.

iPhone app is free. Can host multiple projects if you want. Once you load the project it transfers the form. Can add multiple entires on your phone. Can attach video, stills, sound. Then data sent to central server. Can actually use it without a SIM card, will save it and then upload over wifi.

Can also edit entires and add new entires via the web interface too. Have also included Google Chat, so that you can contact people directly through the web interface.

Data is mapped on Google Maps, which gives you a chance to see distribution, and can click through for futher details. Also produces bar graphs and pie charts.

One project was animal surveillance in Kenya and Tanzania. There’s also health facility mapping in Tanzania. Archeologists dig sites in Europe. Plant distribution in Yellowstone National Park, encouraging visitors to collect data. Street art collection, photographing favourite tags.

Very simple to use, so people can develop their own projects.

Open source so you can host it on your own server, just a simple XML definition.

Yuting Chen, Puzzle@home and the minimum information sudoku challenge

Sudoku comes from the Latin Square, invented in middle age, Leonhard Euler. But Sudoku related to the Colouring Problem, how do you colour each node in a pentagram/star so none have a neighbour the same colour. Think of Sudoku numbers as colours, each square must be different to its neighbour.

Solving sudoku for all sizes – it’s not just 9 x 9 – is an NP-complete problem, i.e “damn hard”!

How many solutions does Sudoku have? For 4 x 4 Latin Square, 576 versions, and for 9 x 9… there are lots and lots, i.e. 6 x 10 ^ 21. Without symetries, Russell & Frazer found 5.4bn solutions if you take out the symmetries.

Sudoku puzzles require clues to define a unique solution. With 4 clues, it might not have a unique solution. So what is the minimum number of clues that will provide a unique solution. Minimum found now is 17. But is there a 16 clue puzzle? Need a sudoku-checker programme to see if any 16 clue puzzles have unique solutions.

If can check for each solution in 1 second, need to spend 173 years to check all the options, but 1 second to search is not feasible.

Fastest checker will still take 2417 CPU years. Volunteer computing can help. Each solution can be checked independently.

Asia@home is promoting volunteer computing in SE Asia.

Future plans include earthquake hazard maps and medicine design simulations.