Sitemap

Teaching Data Literacy with Hugging Face’s AI Sheets

8 min readJun 30, 2025
Press enter or click to view image in full size
Questions that we needed answers to. These would eventually become ccolumns of the dataset we wanted to create.

A step-by-step guide that shows how to go from questions to actual data, one column at a time.

Data is the cornerstone of AI, but the real learning happens when you understand how that data is built, step by step, out in the open. Last weekend, I sat down with my kid to discuss charts, graphs, and how to derive meaningful conclusions from data. This was part of a class project he is doing. I wanted to emphasize that data isn’t just about numbers rather it’s about asking the right questions, organizing the answers clearly, and understanding what the patterns mean.

However, teaching a new concept to a fifth grader isn’t easy. So I leaned into something he was already learning in class, i.e, animal habitats and adaptations. I figured I’d use the same ideas to bring this lesson to life. But how?

Show me the Data?

“Data! Data! Data! I can’t make bricks without clay.”
That’s Sherlock Holmes in The Adventure of the Copper Beeches, snapping at Watson. He was stating the obvious i.e. you can’t do analysis without data.

The first step was to build a proper dataset. I opened a spreadsheet and we added names of the animals that he knew or was interested in knowing about. That was all we had. Just names!

Press enter or click to view image in full size
A Google Sheets spreadsheet displays a list of animal names and their corresponding numbers, organized in a grid format with a blue header row.
A spreadsheet consisting of names of Animals in the first column

We needed to populate this table if we were going to extract anything meaningful from it. For that, we needed some questions.

Brainstorming

We started by brainstorming what we actually wanted to know about each animal. What kind of questions would help us learn something new? What would make the dataset interesting enough to explore? Each question would eventually become a column in the dataset

Press enter or click to view image in full size
Questions that we needed answers to. These would eventually become ccolumns of the dataset we wanted to create.

Answering all those questions for every animal would take forever. Not impossible, though, but it would mean a ton of web searches. Was there a better way? Perhaps there was, and that’s where AI Sheets come in.

AI Sheets to the Rescue

A few days ago, I saw the following tweet on my timeline:

This looked like something I could use right away, and that’s exactly what I did. In this article, I’ve shared my own, unbiased experience with the tool, starting from a single column and gradually building a complete, rich dataset. It’s a great playground for experimentation, letting you test ideas quickly while staying in control throughout.

What are AI Sheets?

The home page of AI sheets

Before moving further, let’s take a moment to understand AI Sheets in a bit more detail. AI Sheets are like a spreadsheet on steroids. Developed by the Hugging Face team, it’s a tool that brings the power of LLMs directly into a familiar table interface. In other words, instead of just storing data, you can ask it to create data. It connects to a wide range of open-source models from the Hugging Face Hub and can also pull live information from the web. You write a prompt, and it fills out the column for you. Since it uses real web results when needed, the facts are verifiable. You can explore them here: hf.co/aisheets

Now that you know what AI Sheets can do, let’s get back to building our dataset.

The Starting Point

We imported our CSV file, which contained a single column of animal names, into the Sheets environment. This would act as the seed column for the dataset. Once imported, the names were populated in Column A of the sheets as shown below.

Press enter or click to view image in full size
Column A containing the names of the Animals. This becomes the first column of the dataset from which we will derive other columns,

Adding Basic information

The next step was to fill in some basic information about the animals. We wanted to capture a few foundational facts about each animal, things like what their scientific name is, habitat, and what they eat. For each of these questions, we added a column and wrote a precise prompt to help the model fetch the right information from the web. Here’s how it looked for the second column:

  • Column B: Scientific Name
    Prompt: What is the scientific name of the {{Animal Name}}? Only mention the name.
Press enter or click to view image in full size
Populating the Column B using web search using names from Column A.

I did a quick Google search to ascertain if the obtained information was actually correct. By clicking the 🌐 icon, I could easily access the sources and verify the results. This is the beauty of this tool, you can manually edit stuff you think is incorrect, or else let it take over. For instance, I got Loxodonta africana as the scientific name of the elephant. But when I checked the sources, I realized this refers specifically to the African Elephant. So I had to be more precise about which kind I meant. I edited the corresponding entry in Column A and renamed it to African Elephant.

Press enter or click to view image in full size
A portion of the sheet page for an African elephant features in a list, explanatory text, and a source citation, with a red arrow pointing to the “Scientific Name” field.
Checking sources to verify the results

We followed the same process for the next three columns as well :

  • Column C: Habitat
    Prompt: What is the natural habitat of the {{Animal Name}}?
  • Column D: Diet
    Prompt: What does the {{Animal Name}}typically eat?
  • Column E: Average Lifespan
    Prompt: What is the average lifespan of {{Animal Name}}? Give the answer as age, in numbers only.

Instead of manually googling facts for every animal, I could ask a clear, well-phrased question once and apply it down a column. The dataset had started to take shape. This is how it looked at the end of our second step.

Press enter or click to view image in full size
Dataset after filling in columns from B to E.

💡 Tips to Improve Feedback

Refine Prompts to Improve Output
The way you phrase a question matters. Even a small tweak in wording can completely change the kind of answer you get. If something feels off, just adjust the prompt and re-run it. You can keep refining until the output feels right.

Use Feedback to Guide the Model
Clicking the 👍 helps the app learn from good examples. Over time, this improves the quality of future completions for that column.

Adding more detail

For each animal, we wanted a bit more than just physical traits. Understanding how animals adapt to their surroundings helps link their behavior to environmental pressures, so we included this in our next column.

  • Column F: Adaptation
    Prompt: What are some adaptations that help the {{Animal Name}} survive in its environment?

💡 These insights would also help set up the next column, where we will summarize and synthesize what we’d learned.

Summarizing Key Facts

After identifying unique adaptations, we needed a way to make the information more digestible. The goal was to pull out the most important points from a longer paragraph and capture them clearly. At the same time, we wanted to keep the original column in case we needed to refer back to it. So we created a new column.

  • Column G: Adaptation Summary
    Prompt: Based on its {{Adaptation}}, summarize it in 5 keywords. The keywords shouldn’t include {{Animal Name}}, the word: Adaptation, or anything which is not an adaptation.
Press enter or click to view image in full size
Comparing model’s answer side by side to ascertain which output works for our sue case.

This step helped simplify complex ideas, which would be easier to visualize later on. As shown above, we compared results from two different models so see which worked better for our use case.

💡 Different models handle summarization differently, hence the results depend a lot on model choice. There are a lot of open models available, and Sheets make swapping models easy and visible, which helped us compare outputs in a real, hands-on way.

Creating a Derived Column

Once we had enough information across different columns, we wanted to see if we could generate something new by combining what was already there. So we added a derived column that pulled from earlier fields:

Column K: Food Chain Role
Prompt: Based on the {{Habitat}} and {{Diet}} of the {{Animal Name}} is it most likely a herbivore, carnivore, omnivore, or scavenger in its food chain?

💡 We didn’t need web search for this one. The data already in the sheet was enough. It’s a good example of how existing fields can work together to create new insights.

Press enter or click to view image in full size

Adding Visuals

AI Sheets also support text-to-image generation, which means we could go beyond text and create visuals directly in the sheet. Kids learn better when they can see what they’re reading. This time, we didn’t have to give a prompt since image generation is available as a method in the dropdown. Just like with text, there are plenty of open-source models to choose from. We chose FLUX.1 from Black Forest Labs since the results were great.

Press enter or click to view image in full size

The Final Result

After completing these steps, our simple list turned into a rich, multi-faceted dataset, which you can then download in CSV or Parquet format. We went from a single column to something meaningful, and then used it for data visualisation. Here’s how the final version looks:

Press enter or click to view image in full size
Final version of the dataset

Conclusion

In this article, I shared my experience of using AI Sheets.
I started with just one column and gradually built it up. I could fix cells manually or feed examples to improve the suggestions using AI. Since each column had its own prompt and logic, I didn’t have to keep starting over like you often do in a chatbot. You stay in control, the sources are visible, and it actually scales. Sheets is a great tool, and I’m looking forward to the new additions the team makes to this product.

--

--

Parul Pandey
Parul Pandey

Written by Parul Pandey

Prev - Principal Data Scientist @H2O.ai | Author of Machine Learning for High-Risk Applications

No responses yet