Upload Your Own Data

Sometimes you may have very local data that is not yet in LiveStories' Data Library.  In these cases, you can upload the dataset directly into your account.  Find out how to do that here!

This page contains a great deal of information; if you nevertheless still run into problems working with your datasets, please do not hesitate to contact us at support@livestories.com.

How to Upload a Dataset 

The video below shows you where to upload your own files into the original version of LiveStories and in the Next version.  The structure and format (CSV or Google Sheet) of the uploaded files should be the same.

Initial Considerations

What data should I include?

In general, there are two reasons to upload datasets to LiveStories: (1) to create charts based on the data, and (2) to publish the datasets themselves for others to download directly. 

If you are using a dataset to create charts, the best practice is to limit the data in your file to what you need. Including extra information in your file—particularly columns—will take longer to index, and charts based on it will render more slowly. 

How big can my dataset be?

LiveStories’ capacity for file management is continually improving. A file with thirty columns or less is generally quick to respond. Files with 80 columns have been successful, but they had under 1,000 rows. A file with 10 columns or less can be fast even with 100,000 rows.

The current maximum size for a file to be uploaded as a CSV is about 50 MB. However, files less than 20 MB will chart significantly faster.

Uploading Datasets

Microsoft Excel

CSV, or comma separated value, files can be easily created in Excel: Simply select “Save As” and choose the CSV file type. It doesn’t matter if you choose CSV, or CSV (MS-DOS) or CSV (MacIntosh). 

Make sure to save Excel files in the CSV format.

Google Sheets 

Google Sheets can be uploaded without modification. But if your sheet has multiple tabs, only the one currently showing will upload to LiveStories.

If you have the feature Nightly Google Sheet Update enabled on your account (contact LiveStories support if you're interested in learning more about this), you can set the Google Sheet to update automatically. If you View Properties of the file, you will see a red toggle switch that enables this feature for the file. The Nightly Google Sheet Update is a feature we can turn on for you at LiveStories.

API Import

LiveStories supports two kinds of APIs.

The Public API allows anyone to download public datasets from a LiveStories account. (Use the  lock/globe toggle on the dataset tile to make the dataset publically available.)

The Partners' API allows interaction between LiveStories accounts and external databases.  These datasets can be uploaded, downloaded, updated, or deleted.  

Further information on how to enable both kinds of API's can be found in our LiveStories Developer's Page.

Updating and deleting datasets

Datasets can be deleted, but any charts based on those datasets will disappear—even from published stories! Deleting is permanent, so be careful, and have a backup file on your computer.

If you want to modify a dataset that is in use by charts, simply update it. Use the Update command to completely overwrite the dataset with the new version. (Your metadata remains intact, though you may want to mention that you updated the dataset.) By using the Update feature, you can easily add more rows and even more columns. You should see no difference in your existing charts, dashboards, and stories—except that they now have the new data. You don’t have to republish your stories or dashboards to use the new data, though you may need to refresh your browser.

Please note: if you change column names from the previous dataset, you will need to rebuild any charts that use those column names. The charts will still be there, but they will be missing the data from the column with the updated name.

Adding metadata

All open/published data should ideally include information about the dataset: the source of the data; information about how the data were derived; what population estimates were used; how certain fields were calculated, etc. This information should be entered in the metadata area of your dataset. You can see this area by selecting "View More."

Even if you don’t intend your data sets to be downloaded by the public, adding metadata can be very useful for internal tracking. 

Cleaning Your Data

Remove titles, notes and blank lines.

First, remove all titles, commas, notes, and extraneous features from your dataset. Your data set should have only column headers and data beneath those column headers.

Also: make sure that every column has a header!

Uncleaned: This file has a missing column header, blank lines, and a super-header. These will confuse the database.

Cleaned: This data set has only column headers and data. This is easily interpreted by the database.

Be sure that your dataset has only one value per cell.

Make sure your entries are uniform.

Computers are extremely literal. You need to ensure that all your text strings are spelled the same if you intend LiveStories to interpret them the same. For example, the name “Rim of the World” is not the same as “Rim of the world” to LiveStories (note the difference in capitalization of world). If your dataset contains two versions of a name, they will be treated as separate names!

A good way to check for this is to use the FILTER feature in Excel or Google sheets. First, apply filters to all your columns. Then, select the drop down list for each text column. The program will show you every unique text value in that column. You will quickly be able to see if you have multiple versions of spellings, capitalizations, or other errors. 

Cleaning numbers

You can use the filters to check numeric columns, too. LiveStories can use only numbers. Remove data error marks that may be in your datasets, such as: #N/A, #REF, negative values, *** or --- . These are examples of marks which are used as place holders for “no value” or as data indicators. (The Find/Replace feature is useful for this.) Move any comments that may accompany your data off the sheet; you can include it in the metadata section on the platform.

How do I get rid of the “*” character?

The * character is a wildcard in Excel. If you use it in a “Find” command, it will find everything. But if you place a tilde (~) before the *, Excel will know to treat the * as a text string, not a wild card.

Organizing Your Dataset

This video will give you an overview of how to structure your datasets to achieve your visualization goals.

Humans and computers see things differently.

Humans like to organize data so they can scan it quickly and find the number they want. For example, an original, human-readable data for a dataset on emergency room visits by insurance payer in Georgia might look like this:

Humans can read this chart easily, by cross referencing the columns and rows. But computers can't interpret this as easily. This file also has super headers (Year).

However, computers are deeply unimaginative. LiveStories reads flat files like a database reads a table: one row at a time. 

Therefore, each row should only include data about a single observation in time and place.

Below is the same data set, formatted for a computer. In this file, one row contains a single observation (Georgia, 2014, Medicaid) which includes location, a time, a type of insurer, and a two values to chart: (1) Number of ER visits and (2) the Rate of ER Visits per 100,000 residents. Notice that each year, the list of payers repeats.

If this dataset included multiple states, the list of payers for each year would all repeat for the new state. 

For nationwide datasets, you may end up using thousands of rows. That’s okay! LiveStories can handle more than a million rows in a file.

In a computer readable data set, one row is one observation, with each column containing a different aspect of that observation.

Measures and dimensions

In this Georgia Insurance Payer dataset:

Measures are the values, (ER Visits and ER Visit Rate per 100,000). Think of measures as your actual data points—something that can be measured. Measures are what is actually charted.

Dimensions can be thought of as “ways to look at your data.” In this dataset, the year, state, and type of insurer are the dimensions—that is, the columns. This format allows you to total or average your measure values by your dimensions. You can total or average the number of ER visits by type of payer, by year, or both.

Counts

LiveStories can also count the number of times a certain dimension appears in a data set. If you create a chart with only that dimension, the count—number of times it appears—is what will be returned.

Column Names and Data Types

LiveStories recognizes certain column names as clues to the type of data they contain. 

These headers are generally recognized as dimensions:

Time: Date, Year, Month, Week, Day

Descriptors: Age, ID, Serial number, Status, Type, Category

LocationCountry, State, County, City, ZipCode, Zip Code, Census Tract

Special: Latitude, Longitude (These should be used together, but the columns don't have to be adjacent.)

Mapping

If you want a column to visualize in a map, you must use one of the location keywords as the column title. (Datasets using custom shape files are slightly different, and will be covered elsewhere.)

Good column names for measures

%, Percent, No, Number, Value, pretty much anything not reserved.

Keeping column names short is a good idea. Put descriptive information on your data in the metadata field of the dataset.

Formatting number spans

Often, dimensions are represented as spans in datasets. For example:

Year: 2000 - 2002, 2003 - 2004, or Age: 1 to 4, 5 to 10, 11 to 20. 

Note: In a spreadsheet, if you use "1-4", Excel will read this as a date and convert it into January 4, before it uploads to LiveStories. To get around this, use "1 to 4". 

 Data like this will be considered as text by the program, though it can be “alphabetized” in numeric order.

Still, you can’t place range information in a column labeled “Year,” because LiveStories will be expecting an integer. Columns with range data should be labeled “Time Span,” or “Age Range.”

Troubleshooting

A column won't chart correctly

As discussed above, LiveStories will interpret certain keywords in column names as corresponding dimensions—and try to chart them accordingly. For example, if your column name has the word "State" in it, LiveStories will treat it as geographic data. 

It is a good idea to avoid using column name keywords outside their intended contexts. For example, if you label a column “State Taxes”, and then put financial data in it, LiveStories will have difficulty charting it outside of a map context. Use an abbreviation, like "St_taxes" instead. Once your data set is uploaded, you can edit the column name for your readers using the View More / Edit Metadata feature on the dataset tile.

Untitled headers

Sometimes, when you upload a data set, you see columns labeled “untitled headers.” These are empty “ghost” columns that Excel preserves when a column is deleted. They get uploaded to LiveStories as empty columns with no names - even if you can't see them in your CSV file.

You can get rid of untitled headers by creating a new blank file. Copy the data from the old file—only select the rows and columns you’re using—and paste it into the clean, new file. Then upload the new file instead.