Everything could be important. I checked things that, from my temperate system bias, seemed likely to be important for the critters I study.
Link to publications dealing with the same sampling site? (but this type of information in a database needs to be regularly updated...)
Many of these factors are going to vary with the goals of the sequencing effort. For instance, large-scale, characterizing soil sequencing efforts need more of this information than sequencing efforts that are focused on point-source soil samples. Studies of polluted sites will require different metadata from studies of farmed (or native, or restored) sites. Rhizosphere soils may not present enough soil to do many of the suggested characterization analyses along with sequencing. I hope that these differences are reflected in the final requirements and give researchers a certain latitude in what metadata is required if it wasn't an appropriate (or possible) piece of information to collect.
Taxonomic names are the most important information.
It would be MUCH MORE valuable if the plant species or functional types were recorded rather than the very crude data of percent cover and root biomass, etc. This would enable community metagenomics once plant genomes or molecular data were to be added. I'm assuming this concerns prokaryote data - or will it include fungal, viral, or protistan data too?
You need to categorize the data collected into what is critical in understanding directly the molecular or genomic data obtained. For example C fractions, microbial activity whether respiration or enzymatic, and N-related data are extremely important because these may either be a consequence of or very inflluential on the microbial communities represented by the extracted soil DNA. The other properties are important in understanding the total ecosystem in which the microbial communities developed, that may have more of an indirect impact. Having long experience in field work, I routinely collect most of the suggested data, which is helpful in interpreting microbially-related results. Good luck.
Perhaps some comments about means used to characterize microbial samples? e.g., methods of extraction, use of pyrosequencing, etc.
The importance that any single investigator places on these variables is going to be highly dependent on the types of questions an investigator is interested in asking. Also the difficulty of collecting the information is going to depend on the availability of equipment/long-term data sets at their research site. Maybe that's what's being asked here, but it's not clear.
Available C = we don't know what this is. Equating it to DOC is likely seriously inadequate. If you want to ask for DOC, OK, but don't call it available C. Perhaps the single most important variable is soil classification. You would never, ever consider submitting a paper to Plant Physiology and describe your study system as a "broadleaf deciduous tree"--you would always provide an accurate and appropriate description of the study system: Quercus lobata. The same logic applies to soils work. If you don't know what type of soil you're working with, you can't do comparisons. Knowing it's a clay loam doesn't tell you whether it's a California grassland Mollisol or a Puerto Rican rainforest Ultisol. But knowing it's a Pachic Argixeroll tells you a hell of a lot about the system the soil came from. All papers on soils should be required to provide an accurate description of the soil taxonomy. Same for metadata. Many of the listed variables co-vary strongly--climate, mineralogy, CEC, base saturation, etc. If you know climate, soil type, landscape position, and horizon, you can infer many of the other variables. Horizon is critical there though--not just depth. If you take a 20 cm core, are you all in the A horizon or are you mixing in B? Comes back to doing a proper characterization of the actual system you are studying. Not doing an adequate soils description has been a major problem for a number of fields spanning from environmental microbiology to ecosystem ecology and biogeochemistry.
Plant species composition is a critical soil descriptor that does not seem to be explicitly covered by any category above.
My feeling is that systematizing soil meta data is a good idea.This is a reasonable way to begin to call attention to the need to learn from trends that may emerge from molecular studies of soil microbial communities. Without uniform meta dat sets, trends cannot be discerned. Obviously all elements (C,N,S,P,..) cycle in all soils. It would be wonderful to have the entire list of suggested measuremetns every time everyone took a sample and made a clone library from soil. HOWEVER, I was reluctant to check the 'REQUIRED' boxes. Every scientific inquiry has its own goals and needs.I think certain types of meta data should be encouraged. But there should not be barriers to submission. (E)
It is rather difficult to express what I find most important and feasible for metagenomics in such a table. Importance of parameters varies with the type of terrestrial ecosystem that is sampled, e.g., a polluted industrial site, a forest, or an arid ecosytem. Water table is important for the first, plant cover and root biomass for the second, and radiation intensities and erosion for the last (just examples). Problem with composite samples is that it may get a more representative view of the communities existing at a specific site, but it might lose valuable microbial diversity as microsites get diluted out.
If one was going to consult a metadata base, who wouldn't want as much information as possible? If that is the case, then wouldn't we all check the required box and rank every importance criteria as high? Also, degree of difficulty will vary from institution to institution, agency to agency, etc. What I might consider easy, others, without the same equipment, may consider very difficult.
It is essential to include more data on processes, i.e., the activities of several enzymes in soils. For this there is another task to be resolved, the standardization of these procedures. I offer my experience in this field if there is a general feeling that this should be included. (P)
I worry about "Total P" and "Total S". Available P and S are very important, but "total" is not. I believe there is considerable ambiguity in this.
1) Have standard methods for soil analysis in order to ensure comparability of the data. Develop written and video (as applicable) protocols to assist with this. 2) Ideally, have one or a small number of core labs do all of the soil analyses that do not have to be done on-site. 3) Possibly encourage deposit of soil samples and DNA in a centralized location(s) for potential use by other researchers. There may be new methods or soil properties that have not been considered that someone else would like to run in the future on the samples. It would take a lot of coordination to decide who would store the samples, how they would be stored, and who could get access to archived samples. Perhaps a committee could oversee this and researchers could submit research proposals in order to obtain samples. 4) Present information about Terragenome at relevant meetings for their feedback. For example, a joint symposium/workshop (including all of the soil science, and maybe some of the crop and agronomy, divisions) at the Soil Science Society of America annual meetings would seem to be an ideal place to get feedback and to advertise this program to other scientists.
Thanks for doing this! ... I'm not a soil biologist, so I cannot assess the difficulty of obtaining things - I am looking at this from the perspective of a bioinformatician with an interest in data integration and metagenomics data mining - I am also involved in pulling these lists together for other ecosystems (i.e., the human body, in the framework of the International Human Microbiome Consortium), so I can compare with what's going on there. From what I can see, one very important missing variable is time and date of sampling, also sample treatment after extraction (freezing, etc.), details on DNA extraction, etc. Let me know if I can help further. (J)
The importance of some of the properties will depend on the context of the study and on the particular soil and climatic conditions, e.g., conductivity is unimportant in temperate soils with high rainfall (my situation), pesticides will become very important if the study is about pesticides, etc. I hate seeing studies published with inadequate soil descriptions - this makes cross soil comparisons very difficult.
I would hesitate to require lots of additional soil tests, since procedures will vary among labs. The main requirements should be GPS data on the sampling location and procedures, land use, extant vegetation, and as much site history as possible.
I don't think it's a good idea to require lots of additional tests, because test procedures will vary from lab to lab. The most important information is location (GPS coordinates), depth of sampling, sampling procedure, land use, and as much site history as possible. Investigators should be encouraged to explain the objectives of their study and provide as much information as they can about their soils.
Some of the measures are closely correlated, and can substitute fore one another. Mostly likely researchers would would chose the one they can do most easily, which is fine. It is better to have some measure for the trait than none.Any information of fungi, esp fungal biomass, would be important.Some features are important depending on region, e.g arid region salt is very impt but not elsewhere, Similarly, in the tropics, Al is impt but not elsewhere. So, the features form should accommodate traits important to regions.
It is hard to give proper score on the difficulty because I have not done many of these. So do not trust such scores very much. Thanks! (J)
It would be great to have ET, as well, but this measurement may be limited by available equipment. Could instead suggest a designation for closest weather station with this info, and give the coordinates of the weather station and the distance between the field site and the weather station. Also, might want to provide the extent and grain size of sampling regime.
I have only provided importance/difficulty information for those variables that I think are required or at least highly desirable.
Could also use extractable P and S, as well as total P and S. These are easy to obtain, and mean more biologically.
We could help in any of the analysis if needed. Thanks for the survey. (B)
It would be nice if a distinction could be made between resting and active populations, e.g. staining spores.
Soil microorgaisms need a food source, elements to balance C uptake, a pH for enzyme function, moisture, and a habitat. Collection of the above mentioned data will be necessary so that you can correlate these pramaters with characteristics of the sampling sites and other geochemical protocols.
Maybe it would be worth including also the following:1. representative soil sample size and whether it was determined2. greenhaus gas emissions measured at the same sites3. description of the method used for sampling - random, transect, designed, modelled based on previus data, ...4. soil moisture profile for the soil below the sampling depth 5. orientation of the site - sunny side of the slope (exposed to sun) or not6. method of DNA extraction (citation of a reference).
Some of the measures are closely correlated, and can substitute for one another. Most likely researchers would chose the one they can do most easily, which is fine. It is better to have some measure for the trait than none. Any information of fungi, especially fungal biomass, would be important. Some features are important depending on region, e.g., arid region salt is very important but not elsewhere; similarly, in the tropics, Al is important but not elsewhere. So, the features form should accommodate traits important to regions.
It is hard to characterize 'difficulty' because there are interacting variables:1) difficulty to perform the analysis2) expense of performing the analysis3) degree to which difficulty/expense scales with number of samples (some things are trivial if you are working with a single sample but very difficult if you are working with 100, while other things are fairly easy regardless of number of samples)4) Difficulty to accurately report the data: things like pH are easy to report, but things like historical land use could be quite difficult (eg: 1900-1910 crops, 1910-1950 successional grasses, 1950-1990 forest, 1990 burned; 1990-2010 forages)4) Difficulty to get reliable data: things like historical land use or past crop rotation, while very important, may be impossible to get if accurate records have not been kept by land managers.