Digital Education Resources - Vanderbilt Libraries Digital Lab

Getting started with the Constellate platform

A short link to this page is vanderbi.lt/constellate

To access the additional resources made possible through the Vanderbilt Libraries’ subscription, you need to access the website either on campus or using single sign-on (SSO) through the libraries’ proxy. Learn more about the enhanced resources available from the subscription.

Slides from first workshop on 2022-09-14

Slides from second workshop on 2022-09-21

Access

If you are on campus, you should be able to simply go to https://constellate.org and skip the rest of this section.

1. Go to the Constellate record in the libraries’ catalog by searching Constellate from the library home page or directly through this link.

2. Click on the Constellate link in the Full text availability section.

3. Complete the SSO using your VUNet ID. You will then be connected to the Constellate website through Vanderbilt’s proxy service. You should be able to see this in the URL.

4. At this point, you can access the enhanced resources made possible by the subscription, but for best results, you should create a JSTOR account.

Create account/login

Although it is possible to play around with the website without logging in, you need a log in if you want to fully access the resources of the site.

5. Click on the Log in link in the upper right of the screen.

6. If you already have a JSTOR account, you can just log in at this point. Otherwise, click on the register one link in the Login with JSTOR credentials popup.

7. Fill out the information in the Register for a free JSTOR account popup, then click Register.

8. After you have completed the registration process, use your credentials to log in. Once you have logged in while accessing the website on campus or using the proxy, your credentials should be associated with the Vanderbilt subscription for 90 days. That will allow you to access the resources by going directly to https://constellate.org and logging in without using the proxy.

Accessing a dataset

9. Once you have logged in, you will see a link in the upper right that says Your dashboard. The dashboard is where you will access any datasets that you have created as well as pre-generated example datasets. Click on that link.

10. For your initial explorations, you can find one of the example datasets and click on Visualize to see what kind of information you can get directly using the graphical interface.

11. To create your own dataset, click on the Builder link in the upper right. If you go there after looking at a previous dataset, you will be refining that dataset, so to start afresh using all possible documents, click the Clear Filters button in the upper left.

12. The maximum size of dataset allowed when logged in as a subscriber is 50k documents. It takes a long time to build and use datasets of that size, so for experimentation, we recommend creating a smaller dataset of a few hundred documents (a 700 document dataset took about 5 minutes to build). Use the various Filters at the left side of the screen to create a dataset that includes documents relevant to the questions you’d like to ask. The graphical displays on the right will adjust dynamically as you apply filters. When you have selected the documents that you want, click the Build button in the upper right. Give your dataset a nickname and click Confirm. This should return you to the dashboard screen.

13. The new dataset should appear in the All datasets section of the screen. It should indicate that the status of the dataset is Building in progress. When it has finished building, the status message will disappear and you can download or analyze the dataset. Note: large datasets can take many minutes or hours to build. If you leave the website, you can come back to check the status later by going to your dashboard.

Analysis with Constellate Lab

You can use the Constellate Lab to analyze your dataset by clicking on the Analyze link in the dataset box. You can also download the metadata and n-grams generated when the dataset was created, using the Download link. For more information, click on the Help link in the upper right. Notes from a session on using these notebooks.

Jupyter notebook platform

Constellate Lab uses Python and Jupyter notebooks to conduct analyses.

To learn more about Jupyter notebooks, see this video.

To get started in Python on your own, see this lesson series.

To get started learning Python as part of a group, see this web page.

Notebook summary

To see all of the available notebooks, click on the Classes & Tutorials link at the upper right. The tutorials page shows you a list of available Jupyter notebooks by experience level. To run a notebook, click on its link. That will take you to a rendered but un-runnable version of the notebook. Then click on the Open in Constellate Lab link at the upper right to open the notebook in the Jupyter platform.

Here is a classification of notebooks by purpose:

Utilitarian notebooks

Exploring Metadata and Pre-Processing – basically wrangling of various forms using pandas. Creates a pre-processing filter to reduce the size of datasets and speed up analysis.

Creating a Stopwords List – by default most notebooks use the NLTK stopwords, but here you can select built-in ones from NLTK, spaCy, or Gensim. Basically this short lesson puts a stopword list into a CSV file.

Exploring Word Frequencies – get word counts after filtering with stop words, then visualize.

Real analysis notebooks

Finding Significant Words using TF/IDF - uses term frequency-inverse document frequency method to rank words in a document by significance.

Sentiment Analysis with VADER – uses a rule-based system to assign a sentiment score to small social media posts. The second part of the notebook uses scikit-learn to train a model to perform sentiment analysis.

LDA Topic modeling – trains a Latent Dirichlet Allocation (LDA) modeling to find topics (groups of words that occur together).


Revised 2022-09-20

Questions? Contact us

License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"