Equitable Data Collection: Best Practices for Demographic Questions

This is part three of Percolator's Equitable Data Collection series. Join us as we take a deep dive into asking the tough questions.

By Kai Addae | November 19, 2024 |
Share
"Question mark" by konradfoerstner is licensed under CC BY 2.0.; https://www.flickr.com/photos/31166238@N02/4168966589

This is part three of our series about equitable data collection. In part one, we discussed the foundations of equitable data collection, and in part two we covered important considerations in the design of your data collection form or survey. In this section, we review best practices for collecting identity and demographic information. We aim to answer the following: How should I ask participants about their race, gender, and/or sexuality? How can I make sure my participants have accurate and comprehensive ways to identify themselves and that I can use the data in aggregate for analysis?


Many institutions ask questions about the identities and demographics of participants out of habit, without really thinking through the “why” of collecting this information. The purpose of this information, as well as considering the impact of these questions on your participants, can guide you in exactly how, when, and where you should place these types of questions in your surveys.

In the previous section, we reviewed why it’s so important to design your survey with the goal of clear comprehension by respondents. In the same vein, you should take great care to use inclusive and accurate categories around identity and demographics. This too, reduces participant stress, but also builds trust in your organization and process, and ensures that you are collecting accurate demographic data about your respondents.

When to ask about Identity

Typically, organizations will place demographic questions at the end of a survey, as a common belief is that demographic questions are still likely to be answered when there is survey fatigue, but there are some arguments for placing them at the beginning as well. If you save progress throughout the survey, you can better understand participation and drop-out rates by demographics, which can help you attune to issues in your surveys that are impacting specific groups. Additionally, some research has shown that placing demographic questions in the beginning leads to higher response rates for those questions and does not impact the response rates for subsequent questions.

There is no one-size-fits-all all approach here, but you should take the time to think through the best placement of your demographic questions. Here is a helpful set of questions to identify when and where to place demographic questions within your survey:

  1. Do you need to collect this information? Do you have an actionable plan to use this data once collected?
    • If not, double-check if this data is something you really need. Remember, asking individuals about personal and demographic information can cause emotional stress and potentially put individuals and communities at risk, so should not be done without a clear purpose.

  2. Do you need to collect this information from everybody, or just from some individuals you are working with?
    • Perhaps you only need this information from a percentage of individuals, or individuals who have completed a certain activity. If possible, find a way to dynamically display these questions so that you only ask for this information when you need it.

  3. Could asking these questions cause undue emotional stress? Could these questions discourage someone from completing the survey at all?
    • When asking questions about sensitive or distressing topics, it can be helpful to warn or notify participants about what topics you’ll be covering in advance. This allows participants to manage their own mental health.

  4. Could asking these questions impact the ability of participants to accurately respond to other parts of the survey?
    • Stereotype threat describes the phenomenon where the stress of being aware of a negative stereotype about a group you are part of causes individuals to perform differently in settings where the stereotype may feel relevant. Research has shown that being reminded or preoccupied with a negative stereotype can change the way people respond when not stressed. To avoid this impact, you could put demographic and identity-based questions at the end of your survey, so that it doesn’t impact the previous answers, and/or provide a clear blurb that assures participants of ways you will use this information positively.

  5. Do the answers to these questions qualify or screen out participants from needing to complete the survey?
    • If the demographic questions you ask help to qualify or screen participants from needing to complete different parts of the survey (or the entire survey itself), then you should ask these questions at the beginning of your survey so you don’t waste the time of any respondents who don’t qualify.

Asking about Identity

When asking about personal identities and demographic information, you should keep in mind the foundational principles of being purposeful, grounded, and transparent.

  • Be purposeful and specific: Only ask for the data you need at a level of granularity that is necessary for your purposes; for example, if you need to know the average age of your respondents but don’t need the specific ages of individuals, instead of asking for birthdate you could instead ask about age ranges or ask only for birth year.

  • Be open and transparent about the ways you are using this data and how it benefits participants: As always, even within the survey itself, be transparent and clear about why you are collecting this data, how it will be used, and how you will keep this information safe, and if there are opportunities for participants to take advantage of your learnings.

  • Be inclusive and keep questions grounded in the context of your audience: Identity categories like race, gender, and sexuality are social constructs – categorizations and labels that we use to describe a spectrum of identities with ever-evolving language. To reflect this complexity, you should consider a variety of factors to make sure you are not unintentionally excluding or othering particular identities in your surveys:
    • Consider using simple open-ended questions rather than prescribing specific categories when asking about identity. One caveat: for questions on extra-sensitive topics like sex and sexuality, people are more likely to respond to questions with a limited set of options.

    • When using prescribed categories for different identities, it’s best practice to use multi-select fields so that people can choose one or more options.

    • Include an option to not answer and/or an open-ended option for more specificity like ‘prefer to describe’ or ‘prefer to self-describe’ (rather than ‘other’ which could feel exclusionary).

    • Be intentional about the ordering of answers and language around open-ended options. A best practice is to list answer options alphabetically or to randomize the order and to clearly note how order was determined. This helps prevent unintentionally communicating that some categories or identities are more or less important than others and prevents bias in responses.

    • Allow people to opt-out of answering demographic questions.

As you design your data collection process, you will need to find the right balance between collecting and analyzing data and gathering data in a way that is still inclusive and respectful of the wide range of diverse identities. It’s unlikely you will get it 100% right on the first try, so be willing to experiment, iterate, and adapt to feedback. Be proactive about testing your tools, getting feedback from your audience, and be ready to respond to what you learn as you figure out what works and what doesn’t.

Breaking it Down: Example Demographic Questions

As a helpful jumping-off point, we will review some of the most common demographic categories, identify best practices, and share examples of inclusive and equitable demographic questions. Most of these examples provided are from More Than Numbers: A Guide Toward Diversity, Equity and Inclusion (DEI) in Data Collection by the Schusterman Family Philanthropies and Guidance for Researchers When Using Inclusive Demographic Questions for Surveys, an article from the Psi Chi journal reviewing evidence-based best practices. These are both well-researched and comprehensive guides in their own right that we highly recommend for further reading on this topic.

Race and Ethnicity

In the US, race and ethnicity are ambiguous categories based on a variety of factors like national origin, socio-cultural context, family background, and personal experience. Asking about race and ethnicity can be complicated, as many people, especially in the U.S., have trouble differentiating between these two categories.

It is helpful to provide clear definitions and examples when asking about race/ethnicity to reduce confusion for respondents and improve the accuracy of responses. Candid has created a clear population classification system that can be a helpful reference for defining ethnicity. The U.S. Census Bureau has continuously worked to refine, test, and improve their racial-ethnic questions and definitions, and can be another helpful resource to review.

Categories for race and ethnicity change from place to place, and will be heavily dependent on your geographic location and the country where you are based. The following examples are US-centric and may not be applicable to all organizations, but are a good starting place for organizations that need to collect this type of information from their communities.

If you are asking about race and ethnicity outside of the US, we recommend taking time to better understand best practices for the country you are in, and reviewing examples from other organizations doing data collection in your country. Additional layers like national origin may be more important demographic categories depending on where you are and the target populations you are trying to learn more about.

Gender, Sex, Pronouns

Historically, institutions have collapsed the categories of gender identity and sex, and asked about gender using terms associated with biological sex (male, female) rather than language grounded in gender identity (woman, man, transgender). As a more nuanced understanding of sex and gender identity has increased, people have begun to move away from this paradigm, but even still, similar flawed understandings of categories around gender identity have led many institutions to use information gathered about gender identity to incorrectly assume pronoun usage (and vice-versa).

Here are some helpful explanations of the differences in these categories from It Gets Better’s glossary. This is a good place to start when drafting specific questions around sex, gender, and gender expression.

  • Sex: At birth, infants are commonly assigned a sex. This is usually based on the appearance of their external anatomy and is often confused with gender. However, a person’s sex is a combination of bodily characteristics including chromosomes, hormones, internal and external reproductive organs, and secondary sex characteristics. As a result, male or female may not accurately describe an individual's sex.

  • Gender Identity: One’s internal, deeply held sense of gender. Some people identify completely with the gender they were assigned at birth (usually male or female), while others may identify with only a part of that gender, or not at all. Some people identify with another gender entirely. Unlike gender expression, gender identity is not visible to others.

  • Pronouns: The part of speech used to refer to someone in the third person. Examples include she/her/hers, they/them/theirs, ze/hir/hirs, he/him/his. Pronouns are chosen by each individual and can only be known when shared. “Pronoun” is more accurate than the outdated phrase “preferred pronoun.”

Depending on your needs, there are different recommended approaches for asking about gender, sex, and third-person pronoun usage. Coming up with a sufficient and complete list can be difficult, as language around gender identity changes quickly, people may identify as more than one gender, and a person’s understanding of their own identity can shift over time.

The simplest and most flexible option would be the open-ended question, “How do you currently describe your gender?” with an option to not disclose. This allows you to be more responsive to rapidly changing language around gender identity, acknowledges that gender identity may change over time, and gives your audience the most freedom in how they respond. This approach does require grouping and categorization of responses after the fact, but gives you the opportunity to most accurately reflect people’s gender identities as they would describe them in your analysis.

The second best approach would be a similar question with the option to select from multiple identities and including an option to self-describe, if desired. ‘Transgender’ by itself should not be a gender identity category, and most recommend that you ask for gender identity separate from asking about transgender identity. An acceptable alternative to reduce question length is to collapse gender identity and transgender status into one question (as shown in the second example below from the Williams Institute).

For programmatic purposes, you should ask about sex, gender, and pronoun usage separately, and not assume answers based on other questions about sex or gender. Here especially, we remind you to be purposeful in your data collection. Only ask for information that is truly relevant to your organization’s purpose, and avoid collecting data just for data’s sake.

For non-programmatic purposes, the only relevant question is usually information about how to refer to people (i.e. pronouns, titles, or honorifics), and questions about sex and gender identity should not be included. When asking for pronouns you should, at minimum, always include a gender-neutral option (they/them). However, we recommend providing a fuller list to choose from including the option to self-describe so that people can accurately respond.

Asking about titles or honorifics can be problematic, as some titles bake in hierarchies about gender and marital status. If you do need to ask for them, make sure it’s required, and ensure you include some gender-neutral options.

Whatever approach you choose, we recommend testing and getting feedback with members of your target audience to make sure that the phrasing of your question makes sense and that you use a list of relevant and expansive gender identity options that feel inclusive to the people you will be surveying.

Sexual Orientation

Sexual orientation can be a sensitive topic for many. When asking about it in your surveys or forms, you should be intentional in the phrasing of your question to make sure you do not other, shame, or imply heterosexuality as the ‘default’. When asking about sexual identity, use a closed and multi-select list of all sexual identities you are aware of, rather than an open-ended question, which may reduce the likelihood of someone responding because of the sensitive nature of this topic.


An important note: Questions about sexual orientation (as well as gender and sex) can be extremely sensitive for people. In different countries, and even in some cities and states in the US, engaging in certain sexual behaviors is extremely stigmatized and may be criminalized. If your survey is not being self-administered, is being taken in a public place, or you lack trust with your audience, you won’t get as many accurate or as accurate responses to these types of questions. Consider the environment in which your audience will be taking the survey, and whether they will have adequate privacy as you decide whether to include questions or this nature.

Disability

When describing people with disabilities, you should use people-centered or people-first language, referring to the person first, and the disability second (e.g. person with a disability rather than disabled person). Understanding of what qualifies someone as a person with a disability is broad, so it can be helpful to introduce questions about ability/disability status with clear definitions, and to separate ability/disability status from need for accommodations or diagnosis to prevent underreporting.

You should not use questions about disabilities as stand-ins for what accommodations someone may need, and vice-versa. For example, if you need to understand what accommodations you may need to make at a venue for your donors, you should ask that explicitly, not ask about what disabilities a person may have and try to infer what to do. Following are some helpful examples of questions about both.

Takeaways:

  • Consider the placement of demographic questions and how asking about sensitive topics may impact responsiveness of your target audience.

  • Build trust with your audience by sharing why you are collecting demographic data and how you will use it in decision-making.

  • Allow people to opt-out of providing demographic information, and whenever possible, ask open-ended questions to be more inclusive to the wide spectrum of identities.

  • Do your research! Review the research and current best practices, find out what peer organizations are doing, and get feedback from your target audience to make sure you are being as inclusive and respectful as possible of people’s identities.

Case Study: Govern for America

Our client, Govern for America (GFA) is working to build the next generation of public servants to create a more responsive government that better reflects and serves our communities. Through their Fellowship program, they recruit and connect graduating students from diverse backgrounds to full-time, paid government jobs and support them in building the necessary skills to lead in government.

The Govern for America Fellowship application process is a great example of working intentionally and iteratively to collect demographic information with specificity and transparency. Below, the Chief Program Office of GFA, Sophanite Gedion MPA (she/her), shares more about the process they’ve undertaken to continuously refine the way they collect demographic data from their Fellows to meet their reporting needs:

Our practices around collecting identity-based data have been and continue to be iterative. I believe that this can be best illustrated by the way that our application demographic questions have evolved over the years to better capture racial identity, gender identity, sexual orientation, and a category that we focus on that we call proximate leaders. Over time, we have worked to capture data that more accurately reflects the diversity that exists within our candidate pool and to share the why behind that data capture with our candidates.

See the snapshot below of the identity section in our very first application (for the 2019 Fellowship Class), the language, culture, and identity section. In our first year, we collected narrow information in a limited way–e.g. collecting race only and only capturing the categories "White", "Black", "Hispanic or Latino", and "two or more races" or collecting sexual identity and gender as binary and/or with optional questions. When Govern For America launched, there was a clear and explicit focus on bringing the next generation of public servants to state and local government, so the majority of the information we were collecting was around age, gender, racial identity, and connections to service (captured initially only as a connection to military service).

In our most recent application (for the 2024 Fellowship class) you can see a much more extensive identity section of the application that shares a definition for proximate leaders and some of "the why" on how capturing this information advances our mission. We added all the categories of proximate leaders we have currently identified and are tracking, which span across parts of identity to capture information on race, sexual identity, and socioeconomic diversity all in one question, which allows for analytic ease on our end. We have also added a component to capture pronouns and gender identity as two distinct categories. To respect each candidate's agency, we always include a "prefer not to answer at this time" and, to acknowledge that our list options are not exhaustive, we also include an "other" option. One lesson learned from this year is that we may have created too many racial identity categories to render meaningful insights for the size of our candidate pool (~500). In the next iteration of the application we will need to decide how to retool our categories for maximal representation while also balancing our data needs.

Previous Next

Related Blogs Posts

Equitable Data Collection: Data Stewardship and Analysis

by Kai Addae | December 4, 2024

This is the final installment of Percolator's Equitable Data Collection series. Join us as we discuss the ways to explore the data you've collected while maintaining privacy and security.

Best Practices, CRM, Engagement Strategy, Salesforce

Equitable Data Collection: Designing for Your Audience

by Kai Addae | October 31, 2024

This is part two of Percolator's Equitable Data Collection series. Join us as we explore important considerations for accessibility, evolving language, and designing your survey.

Best Practices, CRM, Engagement Strategy, Salesforce

Equitable Data Collection: Basic Principles

by Kai Addae | October 9, 2024

This is part one of Percolator's Equitable Data Collection series. Join us as we explore the how and why of data collection.

Best Practices, CRM, Engagement Strategy, Salesforce