Wenjing Wu, Citizen Cyberscience in China: CAS@home

CAS researcher focuses on where volunteer computing and thinking can help. Well known in China, and well trusted.

Chinese volunteer demographics, 42k BOINC users, 420m total internet users, 1.33bn total population. Most volunteers come from eastern developed part of China. Ave age around 27, 90% male, most are students, IT pros, mid-income workers.

EQUN.com, project started in 2003 to translate and provide information on other volunteer computing projects.

Concerns about volunteer computing:

  • Barriers
    • Language barriers
    • Complication of registration and participation
    • Lack of consciousness of science and contribution
  • Security
    • Internet environment unsafe
    • Piracy
    • Usage of public computers
  • Energy
    • Based on coal – worthwhile?
    • Extra air conditioning in hot season
    • High bills
  • China
    • When will China have their own project
    • Now have CAS@home

CAS@home is first volunteer project in China, launched Jan 2010, based at Inst of High Energy Physics, and Chineses academy of sciences. Uses BOINC.

First application is to predict protein structure. Comparing structure of proteins with existing templates to predict structure. Templates are independent so data can be analysed in parallel.

Future project will be to study physics theories in tau-charm energy region, like strong interaction and weak interaction.

Computing for water cleaning, run on IBM World Computing Grid, simulating new low-cost low-pressure water filters. Filters use nanotubes, as flow resistance in carbon nanotubes is 1000x lower than predicted, so work at low pressure. Physical mechanism not fully understood. Want to simulate this in more detail using molecular dynamics.

Ben Segal, LHC@home starts to tackle real LHC physics

LHC accelerates protons and particles, analysis of resulting events is helping understand the nature of matter, origin of the universe etc.

LHC@home is based on BOINC, to allow volunteers to lend their computers, started five years ago. Was a tool to help design and tune the accelerator itself. Beams circulate and collide many times per second. Objectives were to raise awareness of CERN and the LHC, as well as providing extra CPU power.

Project has run intermittently, which the volunteers don’t really like. Hoping to start giving them a steady flow of jobs in the next few months.

Now want to do “real physics”.

Serious challenges:

  • Most volunteers run Windows, but experiments run on Linux and exporting to Windows is impractical. Didn’t have that problem with design project.
  • Code canges often, so all PCs must update
  • Code size very big

Solution:

Can we use virtualisation? Entire applications environment is packaged, sent out as a ‘virtual image’ and executed in a virtual machine.

Result:

Coding proting to Windows happens automatically. But the problem is the virtual image is still very big (10 Gbtes), and must rewrite whole virtual image for each update.

So there’s a solution but it’s not very practical.

CernVM – when mainframes ruled the roost, IBM mainframe was called CernVM. It was replaced by PCs and Unix/Linux.

The new CernVM is a ‘thin’ virtual appliance for the LHC experiments. Racks of virtual machines. Provides a complete, portable and easy way to configure user env for running locally, on a grid or in the cloud.

New LHC@Home project will use CernVM. Sends user 0.1 GB, it logs into system and downloads rest up to 1GB, and never has to load the full 10GB. Can run real physics with just 1GB.

However, that’s not the end of it. System runs on BOINC, which needs the jobs to be sent from the physicists. But they won’t change their current set up to produce BOINC jobs – they want to know where their jobs are and be able to manage them, and BOINC doesn’t allow that. Their job submission system does do that.

CernVM has software image control, also added an interface for job management.

All done by students and volunteers, which makes it a bit intermittent.

Peter Amoako-Yirenkyi, AfricaMap – volunteer cartography for Africa

A lot has been said about maps, but old maps no longer relevant. Need maps that reflect people’s real-world needs.

There’s a lack of data and geo-information. UN geographic data in different formats on different platforms. Details that are required to make the map useful are just missing. Maps look good at scale of 2000 ft, but at 200 ft are woefully inadequate.

AfricaMap uses volunteers to do tasks that required no specific scientific training. Annotate satellite imagery and provide geospatial data.

Task-specific because they are working with an agency that is well defined in the way it works, i.e. UNOSAT. UNOSAT provide satellite-based solutions for UN, local gov’t, NGOs etc. have made over 1000 maps/analyses in over 200 emergencies and conflict zones. Support early warning, crisis response, etc.

Objective of AfricaMap is to help UNOSAT solve this problem.

  • Produce maps for humanitarian causes
  • Genearte early warning, crisis response, human rights, sustainable recovery information
  • Capacity for specific requrests, e.g. sudden disasters situations
  • Provide historic data

Volunteers look at sat images, on laptop or mobile, and annotate.

This activity has a direct impact on the volunteers themselves, there is an immediate need for this information.

Designed a learning/training system from the beginning, which helps calibrate the volunteers so that they can be given specific jobs that fit their skills.

Create work units from tiled imagery, define and send jobs to volunteers, receive completed jobs, validate, eliminate work done, score volunteers’ work, place them in levels and teams.

Africa Map is still in active development. Not starting from scratch – building on existing technologies.

Muki Haklay, Extreme Citizen Science

About people going out, not sitting at home in front of their computer, e.g. the Christmas bird count, climate modelling data from weather observations. Has been going on a long time, but evolving into cyberscience, e.g. setting up a self-activated camera to monitor wildlife, or a crab survey that requires sheet from the internet.

Can take information out of things like Flickr, Picasa Web, Panoramio and Geograph (project to take photos of the whole of the UK).  There is a concentration of photos in cities, but when you control for population, you see hotspots in tourist areas, and blindspots in suburbs.

OpenStreetMap, 30k active volunteers contributions. Completeness can be tested by comparing to OS data. By March 2010, OSM is now significantly better than Meridian 2 in most of the UK.

Control for the name of a street, because you need to be in the place for that, you see cities stronger. Where there’s higher population, OSM completes faster.

Compare to index of deprivation, and deprived areas are not covered as well as wealthy areas.

Citizen science in perspective

  • Citizen scientists
    • Collect data
    • Act as an inelligent platform for sensors
    • CPU cycles
    • Basic classifications
  • Geographical distribution, bias to highly populated, central places, toursits
  • Bias towrars affluent areas and participants
  • Demographic analysis shows high levels of education and interest in the domain

In that way, citizen science is missing a trick.

Literacy: still have a lot of people who are non-literate who are excluded. Almost of projects benefit from growth of higher education in the 60s, and that number will increase over next 10 years. Look at penetration of computers, 70% already have PCs, number increasing. Broadband, seeing much wider bandwidth which will allow us to do more interesting things. What would ike to suggest is tht there is a potential for ‘extreme cit sci’.

Users: currently focusing on highly educated and domain knowledge, but want everyone to be able to participate regardless of literacy level.

Location: Want to include everyone everywhere.

Role: Get participatory and collaborative mode where people help shape the problem.

Already happening, e.g. EPSRC project SuScit, as peoplea re discussing and able to shape how the science is done.

Noise mapping, working on Isle of Dogs, talked to community to shape what they want to do, said they were bothered by airplane noise. Volunteers in the area. Move away from computers, use paper. Track noise levels, and map to say where they are.

But then they can enter data on their website, and can provide photos e.g. to show that there’s a stack over heathrow, and a lot of airplanes in the sky at once.

During Eyjafjallajökull, community was monitoring throughout the flight ban, so the noise pollution was reduced.

Worked in Deptford, in social housing, and nearby scrapyard has made their life hell. There is also a community centre and a nursery nearby. Worked with community to monitor the noise. volunteers spread through whole area, and showed noise map, which then used in discussion with local authority about what needs to be done. Community had been complaining for 6 years, but after the local authority saw the evidence, they revoked the scrap yard’s licence.

CyberTracker, working with Bushmen in Africa to gather data. They have iconic representation of information on a GPS device in order to monitor wildlife.

Another project, working with hunter-gatherers to identify things that are important to them, e.g. trees used for food so that they won’t get cut down.

Opportunities are exciting:

  • Interfaces suitable for non-literate users
  • Bundles of sensors, data collection and analysis tools that can be applied in different contexts
  • Understand patterns of use, motivations and incentives – the science of citizen science

The Anatomy of Citizen Cyberscience

Why do people get involved in citizen science?

Becky Parker

When I was young, and watching the RI Christmas Lectures, was inspired by Carl Sagan, where he went and had tea on Mars. Can remember where I was sitting, what he said, and thought that would be amazing! Moved from girls’ school to comprehensive, and was only girl doing double maths, physics and chemistry. Was so interested in astronomy, went to Norman Lockyer observatory, and go look at the stars. Ended up in physics. Had Tony Legget, Nobel Lauriate, and to courses on foundation of quantum mechanics. Thought, this is so amazing, why don’t students love it? Went into teaching, and want to inspire students? Very lucky, have a good school, head that supports projects.

Julia Wilkinson

Gave up science at school, girls weren’t encouraged to do it, and has regretted it ever since. Apollo missions inspired her to get into astronomy. Passion for astronomy has lasted all her life, got back into it 10 years ago when bought a telescope. Three years ago, was looking for a way to get more involved, saw Stardust@Home, and thought, I can contribute to real science. A few weeks later, found Galaxy Zoo, and that was better as observed galaxies through telescope, so this was what she wanted to be involved in. On the back of this, now studying science with the Open University. Has experience with volunteers, and has noticed a lot of overlap in terms of way that citizen science works with volunteers. See same patterns of behaviour. Voluntary sector, constant need to motivate volunteers, lots of challenges, feedback etc. That’s what cyberscience does, but if you let volunteers know exactly what’s happening with the data, that increases morale.

Richard Haselgrove

Interest first formed in childhood, parents both involved in early days of electronic computers, so grew up with them. Went to standard school career, university, and that’s as far as it went at that stage. Moved into the public sector. Left science and computing behind until arrival of personal computer, in around 1980. Could then start to experiment with computing for more general purposes. Now, linking of communities by technology taken for granted. Read in press about SETI@Home, reconnected with scientific interests. Computer a volunteer, but he wasn’t. Now he’s nearing retirement, is more able to volunteer himself as well. As people have more time to commit, volunteers do gain a lot of experience, what draws him further in is developing knowledge that he can pass on to arriving volunteers and to new projects. Can’t always get involved with the science behind a project, but can help with the project from a how you deal with volunteers, the platform, etc. We as volunteers have an impact on scientists, and have a lot of valuable insight to feed back into the projects.

Christian Beer

Started with SETI@home, was doing internship in web development and someone there showed it to him. Have run it on every computer he’s had since then. Got interested in BOINC, and also the science, not just how they are searching for aliens, how the volunteers work together. Contribute not only computer power, but also knowledge in programming. Motivation to learn how to programme. Also wanted to give the knowledge away, but it’s not giving it away, it’s multiplying it. Social part is great motivation.

Bruce Borden

Interested in similar stories that I’m hearing – we don’t know each other, but I have a lot of things in common with what’s already been said. Am a retired scientist, advanced degree in maths, worked for an engineering firm doing mathematical analysis. Concepts of maths and how to do simulations are comfortable. When he retired, asked same questions about what he was going to do with the rest of his life, how could he spread the knowledge he has. Had discovered SETI@home, thought it was an intriguing idea. A few years later, got interested in Folding@home, Standford University’s programme. Also influenced by previous volunteer work, spend two years when he was younger teaching maths as part of the Peace Corp. Important aspect now is that he’s hooked by the science. Also important is managing volunteers, keeping them enthused, and this is an area that is grossly neglected. Need to take care of volunteer’s feelings about what they are doing. Wide range of how we have to deal with volunteers based in part on their skill level, there is a wide range of people, need to deal with them in slightly different way. Skills I can deliver in addition to teaching about the science or maths, or computers, work with the forum primarily with an educational goal.

Ian Hewliss (?)

Qualifications are simply that he watched a TV programme about climate change, and invited viewer at the end to run some software. Thought it was easy, but others found it harder. Rang BBC climate change project and started helping people with technical problems. Sense of community builds up, because people feel what they are doing is relevant, and it references what other people are doing. He is a physics graduate, since then used that to model behaviour of satellites, so now does radio comms. Accidental cyber-scientist. What’s his motive? Hard one to answer. Word ‘citizen’ implies a community. Used to talking about ‘citizens’ in a political way, right and obligations, there are two communities in which citizen word is relevant – one is to ask, why do people participate? Also community of a particular project. Debate to be had about balance of rights and obligations of participants.

[Then followed a discussion, which I’m too tired to transcribe! Still, interesting stuff.]

Becky Parker, Building cosmic ray detector networks in schools

Enabling school students to do real science via CERN technology. Want to show kids that they can be involved in what is going on now, that science is vibrant and interesting subject.

The Langton Star Centre is ?at Simon Langton Grammar School, Kent, gives student change to work alongside scientists and engineers. They go to CERN every year, and one year visited the lab of Dr Michael Campbell’s Medipix lab, working on a chip for the ALICE detector. Can be used for medical imaging. Lots of research and collaboration using this chip.

Competition for schools to design experiment to go into space. Wanted to do a cosmic ray intensity detector using Medipix chip – called LUCID, which ?will fly in 2012. Won, but project was a bit expensive so they have an earthbound version too.

Wanted to get more schools involved, which led to CERN@School, so different schools can look at cosmic rays in space and on earth via detector in their own lab. Pick up data and then examine it in the school lab, can do particle recognition.

Found it hard to get the money together, but got a pilot scheme in a ten other schools that take data at a set time each day. Schools pool the data, then it goes up on LSC servers and can see it on a map. But how to analyse it and get good science out of it? Now have a model using grid storage and computing. Will soon be able to do analysis of tracks.

Next step would be linking up with other cosmic ray projects.

Expanded project would enable sophisticated analysis and potentially useful result. If had enough schools, would have an enormous network of detectors, might be able to discover particles above GZK limit.

CERN@School invigorates teachers as well as inspiring students. Hope to attract more scientists into schools. Doing real science, real analysis, is not only fantastic, it also shows how smart and capable these students are.

Tom Humphrey, Herbaria@home: crowd-sourcing the documentation of natural science collections

Herbaria@home, herbarium records, snapshot of the world before agriculture, including areas now completely obliterated. Found plant, Ghost Orchid, thought to be extinct.

Plant don’t move, but they do invade, e.g. Oxford Ragwort. Scilian plant introduced in 1700 to an Oxford botanical garden, escaped, and now has spread out across UK. Roesbay willowherb, but railways have distributed seeds and now it’s everywhere. Plants also go by road, e.g. Danish scurvy grass, should be coastal, but now has colonised verges.

Plant populations are in flux. Modern survey data alone isn’t enough, so need the historic data to give context.

Web based project to catalogue old data. Collection of 50,000 documents he is working on, several million UK-wide, but even with willing volunteers (in person) there are too many records.

Online, Wikipedia established that people would do this sort of online work, you can allow open access editing and it wouldn’t be mayhem. Distributed Proofreaders showed that people will transcribe text from the internet.

Have taken photos of documents and put them online along with a form that people can fill out to say what they see in the label data around the specimen image, e.g. site names, collectors, date.

Some of the documents are quite clear, because they are printed, but there are a lot of hand-written documents, e.g. from 1859, and the handwriting poses quite a problem. Handwriting recognition may eventually get there but it is quite a long way off.

Once you have the data, can give it a grid reference and put it on a map. Can validate a lot of the data as they enter it. Need volunteers to collaborate and discuss what they see, so have active message boards. Have a pretty expert volunteer set, e.g. with plant recognition especially of rare plants.

Have worked on collections from several large universities and museums, and often they don’t have a full time curator for these collections so that data is inaccessible otherwise.

Peer-revuew of records, people have free access to edit anything, and public edit history for every record. Botanists able to spot errors and make changes, but a lot of non-professional botanical expertise, people keen to work on a project like this.

Some similar collections likely to come online in similar projects soon, e.g. insects.

Benefits: improve access to collections, raise profile of collections, and people enjoy it as a hobby.

Bruce Allen, Einstein@Home: hunting for neutron stars with gravitational and radio waves

Einstein@Home is a traditional citizen science project. Have 100,000 computers at any one time contacting project and looking for work. People join the project by joining website, download & install software, and then leave it alone. Get a screensaver (which is very pretty!), and when their computer is idle it is analysing data.

Physics experiment data. Not simulating, but taking real data about physical world and searching for very weak signals that reveal neutron stars – very compact, small start, 10km radius, which beams radio waves like a lighthouse. As beam passes by Earth you see a flash. Forms when an ordinary star burns all its fuel and collapses under gravity, electrons get crushed into the nucleus, combine with protons to form neutrons, which are 100x smaller than the original atom. They spin very quickly for same reason an ice skater spins faster when they pull in their arms.

Example, Crab Pulsar, formed 1054AD, spins 33 times per second. About 100m neutron stars in galaxy, but have found about 1900, mostly near us.

Einstein@Home uses gravitational waves to search for neutron stars sent out by the star. Detectors, built in last 20 years, made of mirrors hanging from wires, and when a gravitational wave comes along the mirrors swing a bit. That can be detected.

Also use data from Arecibo radio telescope in Puerto Rico. First discovery: 11 July 2010. Signal was followed up the next day and reconfirmed quickly.

Found a second radio pulsar, currently unpublished, appears to be a binary system, but not yet clear what the masses of the stars are.

Publicity of first discovery has been very inspiring for users and project team. New users jumped when the publicity happened, and the number of users leaving the software running continues increase.

Square Kilometer Array, which will come online in 2019 or later will produce so much data that distributed computing may be the only way to process it.

Myles Allen, Why does climate science needs petaflops?

 

Most citizen computing projects can do no wrong. That does not apply to climate science. ClimatePrediction.net raises a few hackles, often in situations such as a recent meeting about investment in supercomputer, and when someone said, “I hear you can do a lot of this on PS3s nowadays?” there was a degree of hostility.

It’s just a way of addressing certain problems. People think of models as being done on a petaFLOP Cray XT-6, but most climate scientists don’t have access to these. They can have access to citizen scientists.

Climate modelling depends on:

  • Complexity, e.g. number of processes, number of aspects of the sytem
  • Resolution of your model, e.g. 100km scale, 10km scale or 1km scale
  • Duration of the run
  • Ensemble size, (groups of models) and this is where citizen cyberscience comes in

Often need to run models may times, and this is where ensembles come in.

Uncertainty in models varies. Uncertainty was felt to be underreported, so added subjective assessment of uncertainty, some of numbers are a little bit rounded, as they were decided on through discussion.

Suspected the model ranges were too small was because all the models matched the 20th century numbers ‘suspiciously well’. Need unrealistic models as well as realistic ones. You have to go outside the range that is fits perfectly in order to be sure you know what that range of forecasts are consistent with current observations.

Serious money to do a run, so they are looking for good models, not ‘bad’ ones.

By doing tests of different models (using citizen science), see that there -40 error bar was too pessimistic in terms of uncertainty, and the +60 was about right.

Learnt that the lower bound too low, upper bound about right, but this was through experimentation, not discussion. Therefore is testable.

What next? Using volunteer computing to see how extreme weather and climate are related, as global warming can cause both extreme hot and cold weather events, e.g. heatwave in Russian, Pakistani floods. Were they one event or two? Where they related to global warming?

Looking at the flooding in UK in 2003, simulating seasons where damaging weather events occur, both with and without the signature of climate change, to see if it had an effect. Looking for influence of external driver – human influence.

These are rare events, so have to model them many times to see if risk of extreme event has increased.

Projects in development, embedding regional models in simulations.

Have only used participants to provide compute power, so haven’t engaged participants brains. Big challenge faced is that only a few hundred people take part.

 

David Anderson, A Brief History of (CPU) Time

Comp scientist at UC Berkley, building platform for citizen science. Looking for commonalities, software support that addresses community’s needs to make it easier for scientists to use volunteer power. Tech is only one piece of the solution.

Build platforms for:

  • Volunteer computing
  • Distributed thinking
  • Education

Computational science:

Simulations are now so vastly complex that they can only be done on computer. Simulations at various scales,  e.g. proteins, ecosystems, Earth, galaxy, universe. . Need lots of computing power because need to fit models to observed data. To predict what’s going to happen you need to run thousands or millions of simulations.

Generation of new instruments, e.g. LHC, LIGO, SKA, gene sequencers, produce data at unprecedented rates, right at limit of computers to handle. Beyond limit of computers owned by unis or institutions. Science limited by computing power and storage capacity. What we need is not a faster computer, but higher throughput, i.e. a lot of computers.

Consumer digital appliances, e.g. computers, handhelds, set-top boxes, are all converging on similar hardware. Networks that connect them all: consuemr digital infrastructure. 1.5 billion PCs. Graphics processing improved through desire to watch HD TV and play realistic games, and GPU oft 100x CPU speed. Put there for games, but good for science.

Storage on consumer devices approaching the terabyte scale, network approaching 1 Gbps.

All this is ideal for science computing!

Compare consumer digital infrastructure with institutional counterpart, it’s way bigger and way cheaper than institutional computing. Supercomputers moving towards an ExaFLOPs in 5 years, but consumers already have 1000 ExaFLOPS today. Consumer spend $1 trillion per year!

BOINC, free open source software, anyone can create a project.

Utopian ideal, to have a lot of these projects, getting computing power by advertising research to the public, educating public on their project, so public supplies resources to science where they want to put it.

Boinc projects

  • ~30 projects
  • 300k vols
  • 530k computers
  • 3 PetaFLOPS

Volunteers can do more than run software. they can provide tech support, can optimise programme, translate website, recruit new users. Initially used message boards, realised that they weren’t working well for non-tech systems, so now have a system based on Skype, so people needing help can find someone who is willing to give that help via Skype.

Volunteers have a spectrum of confidence, and some users are malicious, e.g. they have had people trying to scam others users and trying to get their PayPal IDs.

Motivations study. People interested in doing science, want to show the world they have the fastest computer, people who want to be a part of a team.

Distributed thinking. Stardust@Home, interstellar dust photos, looking for grains of dust. People can do this better than computers. Interesting thing was needing to quantify accuracy of results. Created samples where they new the answers, i.e. either contained noise or had a particle. Every 5th image was a callibration images and so could keep track of false positives and negatives.

Also used replication, many people look at it, and then if there is consensus can look at calibration results and that shows if consensus is correct. Project found all the dust particles it could.

Created platform called Bossa. Middleware for distributed thinking, provides scheduling mechanisms, e.g. calibration jobs, replication. Open system with respect to assessment, scheduling policies.

Being used to find fossils.

Also extending Bossa to Bossa Nova, looking at more complex systems for asking people to do things involving creativity, problems for which there is no unique answer. E.g. complex problem solving, use volunteers to decompose problem into sub-problems, propose solutions, evaluate them, evaluate how a group of solutions might work together. Involves different skills. At software level, it uses people optimally, uses them for tasks to which they can contribute most.

Education and citizen science. If we can train people to do more complex staves we can achieve more. This is very important – if people learn more they may stick with a project longer, recruit more people to help, and you get more computing power.

Challenge of training or educating 100,000s people is a challenging problem, not attacked by traditional education theory. Heterogeneity is problematic: different backgrounds, education levels, locations, language. What makes this tractable is that there is a constant stream of students arriving, dozens, 100s, 1000s new users per day, so we have lots of people arriving interested in a course, so we can do experiments. If we haev two alternative ways of teaching a concept, we can rig up the software system to randomly show one lesson or the other and then they take the same assessment.

May learn one lesson is better than another, or one lesson is better for a subset, e.g. based on demographic or other attribute, and can then make an adaptive course where as we learn more about the student we refine how we teach them. Not just individual lessons, but overall structure of course.

Bolt: system for tailored education for large streams of volunteers.