October 11, 2019 - Caroline

RPP #6: Quantitative Data Sources

I am proposing to research agribusiness because I want to find out what explains the corporate circumvention of regulations that enables the production and sale of “banned pesticides” to help my readers better understand whether or not the business decisions behind the use of agrochemicals is for the public good.

My large-n analysis research question is:

What explains the variation in regulating pesticide use within a country?

I plan on creating my own dataset for the dependent variable—the regulation of pesticide use by country (my unit of analysis)—using data sources like the Food and Agriculture Organization of the UN (FAO) and Organization for Economic Co-operation and Development (OECD). I would operationalize the dependent variable using an ordinal level of measurement in the form of a scale ranging from “low” to “moderate” to “high.” Because there is no one concise dataset for this particular variable and my analysis would be based on studying a range of other, more specific datasets, I am using an ordinal rather than interval level of measurement and would then try to see if there is any correlation among them.

One independent variable that I would include is the interval measurements of foreign direct investment regulatory restrictiveness from the OECD.[1] These numbers could hint at how much power the government of a host country of a foreign MNC might have versus the company itself concerning pesticide use. It could also provide insight into how much foreign market involvement a country has impacts its own control on certain sectors like agriculture. Another dataset that I plan on using for an independent variable is government expenditure on environmental protection.[2] Seeing how much money or attention the government is putting towards the environment could be a good indicator of the strength of resulting regulations. Another indicator is the Environmental Performance Index which uses ordinal levels of measurement via rankings and scores to demonstrate how well or poorly a country is doing in things like lead exposure (which could be an indicator of a larger environmental toxicology problem).[3] In terms of cases covered, my dataset would definitely include China, the top exporter of pesticides, and Brazil, the top importer.[4] I would focus on countries in the European Union, Asia, and Latin America since they are the ones covered in the datasets I found. As for limitations of the dataset, because they are from different organizations they vary in scope and format. Another possible limitation is missing values, especially from the 90s. I would like the time frame I use to include the 90s, however, because that is when the major international chemical conventions occurred and I want to include any interesting increases or decreases in regulation before or after their ratification by countries.

[1] “OECD FDI Regulatory Restrictiveness Index.” OECD.Stat. Accessed October 10, 2019,


[2] “Government Expenditure (subsection: Environmental protection).” FAOSTAT. Accessed October 10, 2019, http://www.fao.org/faostat/en/#data/IG

[3] “Lead Exposure Results.” Environmental Performance Index. Yale Center for Environmental Law & Policy.  Accessed October 11, 2019, https://epi.envirocenter.yale.edu/epi-indicator-report/PBD

[4] “OEC – Pesticides (HS92: 3808) Product Trade, Exporters and Importers,” accessed October 11, 2019, https://oec.world/en/profile/hs92/3808/.

Research / SISOlson / SISOlson19


  • Avatar Rachel Rubin says:

    Hi Caroline.
    This is the first I’ve read about your project, and I think you’ve done a great job with this post. I notice how you made the distinction between ordinal and interval levels of measurement, which is critical in order to understand the limitations of what your dataset can actually tell you. I am also impressed by the creative possibilities for independent variables you’ve identified–thinking about foreign influence, small signals of larger envrionmental problems like lead exposure, and imports and exports are all multidimensional ways of looking at the puzzle, and you’ve clearly done the reading of existing scholarship here to come up with those.
    I encourage you not to think of the limitations in scope as setbacks. I have also found that datasets are usually most limited temporally and geographically, which isn’t necessarily meant to be discouraging–it helps us narrow down the exact timeframe and region of the world we may want- or have- to hone in on. It is also keen of you to note that there are missing values from the dataset, and I wonder what that might mean–why is it that those particular values couldn’t be collected? And is it possible any of the data has been skewed by the decision-makers themselves? Is there an inherent slant in the data in order to promote a certain environmental image? These questions don’t necessarily matter right now, of course, but it is certainly something to keep in mind as you continue to find data to use in the creation of your own dataset.
    I look forward to following the rest of your project as it progresses.

  • Avatar Savannah Kleeman says:

    Hi Caroline!

    I really liked what you did in this post! I liked how you used ordinal statistics in this. I also think that using nominal could be very useful in your research. I think that the stats that you found can lead to very interesting results and that how you are thinking of operazationling your research is very well done. I think that using a variety of independent variables can help to further your research.

    I am very excited to see where your project goes! Good luck!

  • Caroline — the datasets that you discuss in this post are clearly relevant to your project and they will provide good information (particularly on independent variables, as you note). Remember, though, that operationalizing the DV (which includes making sure that there is data that demonstrates the variation that you propose to analyze) is the more critical first step. As we’ve discussed, you really want interval/ratio data for the DV. How could you operationalize the concept of “regulating pesticide use” in such a way that you would have an interval/ratio indicator?

    As you move forward, keep the question of case selection in mind as well. You note that your dataset “…would definitely include China, the top exporter of pesticides, and Brazil, the top importer” and that you would “…focus on countries in the European Union, Asia, and Latin America…” Remember the general principle here that having more cases is always better. Absent dataset limitations, it is best to include as many cases as you can rather than try to pick certain regions or countries (picking these biases the sample that you are analyzing). If needed, you can always include control variables for concepts like region like Ross does.

Leave a Reply

Your email address will not be published. Required fields are marked *