Digital Education Resources - Vanderbilt Libraries Digital Lab
A short link to this page is vanderbi.lt/constellate
To access the additional resources made possible through the Vanderbilt Libraries’ subscription, you need to access the website either on campus or using single sign-on (SSO) through the libraries’ proxy. Learn more about the enhanced resources available from the subscription.
Slides from first workshop on 2022-09-14
Slides from second workshop on 2022-09-21
If you are on campus, you should be able to simply go to https://constellate.org and skip the rest of this section.
1. Go to the Constellate record in the libraries’ catalog by searching Constellate
from the library home page or directly through this link.
2. Click on the Constellate
link in the Full text availability
section.
3. Complete the SSO using your VUNet ID. You will then be connected to the Constellate website through Vanderbilt’s proxy service. You should be able to see this in the URL.
4. At this point, you can access the enhanced resources made possible by the subscription, but for best results, you should create a JSTOR account.
Although it is possible to play around with the website without logging in, you need a log in if you want to fully access the resources of the site.
5. Click on the Log in
link in the upper right of the screen.
6. If you already have a JSTOR account, you can just log in at this point. Otherwise, click on the register one
link in the Login with JSTOR credentials
popup.
7. Fill out the information in the Register for a free JSTOR account
popup, then click Register
.
8. After you have completed the registration process, use your credentials to log in. Once you have logged in while accessing the website on campus or using the proxy, your credentials should be associated with the Vanderbilt subscription for 90 days. That will allow you to access the resources by going directly to https://constellate.org and logging in without using the proxy.
9. Once you have logged in, you will see a link in the upper right that says Your dashboard
. The dashboard is where you will access any datasets that you have created as well as pre-generated example datasets. Click on that link.
10. For your initial explorations, you can find one of the example datasets and click on Visualize to see what kind of information you can get directly using the graphical interface.
11. To create your own dataset, click on the Builder link in the upper right. If you go there after looking at a previous dataset, you will be refining that dataset, so to start afresh using all possible documents, click the Clear Filters
button in the upper left.
12. The maximum size of dataset allowed when logged in as a subscriber is 50k documents. It takes a long time to build and use datasets of that size, so for experimentation, we recommend creating a smaller dataset of a few hundred documents (a 700 document dataset took about 5 minutes to build). Use the various Filters at the left side of the screen to create a dataset that includes documents relevant to the questions you’d like to ask. The graphical displays on the right will adjust dynamically as you apply filters. When you have selected the documents that you want, click the Build
button in the upper right. Give your dataset a nickname and click Confirm
. This should return you to the dashboard screen.
13. The new dataset should appear in the All datasets
section of the screen. It should indicate that the status of the dataset is Building in progress
. When it has finished building, the status message will disappear and you can download or analyze the dataset. Note: large datasets can take many minutes or hours to build. If you leave the website, you can come back to check the status later by going to your dashboard.
You can use the Constellate Lab to analyze your dataset by clicking on the Analyze
link in the dataset box. You can also download the metadata and n-grams generated when the dataset was created, using the Download
link. For more information, click on the Help
link in the upper right. Notes from a session on using these notebooks.
Constellate Lab uses Python and Jupyter notebooks to conduct analyses.
To learn more about Jupyter notebooks, see this video.
To get started in Python on your own, see this lesson series.
To get started learning Python as part of a group, see this web page.
To see all of the available notebooks, click on the Classes & Tutorials
link at the upper right. The tutorials page shows you a list of available Jupyter notebooks by experience level. To run a notebook, click on its link. That will take you to a rendered but un-runnable version of the notebook. Then click on the Open in Constellate Lab
link at the upper right to open the notebook in the Jupyter platform.
Here is a classification of notebooks by purpose:
Utilitarian notebooks
Exploring Metadata and Pre-Processing – basically wrangling of various forms using pandas. Creates a pre-processing filter to reduce the size of datasets and speed up analysis.
Creating a Stopwords List – by default most notebooks use the NLTK stopwords, but here you can select built-in ones from NLTK, spaCy, or Gensim. Basically this short lesson puts a stopword list into a CSV file.
Exploring Word Frequencies – get word counts after filtering with stop words, then visualize.
Real analysis notebooks
Finding Significant Words using TF/IDF - uses term frequency-inverse document frequency method to rank words in a document by significance.
Sentiment Analysis with VADER – uses a rule-based system to assign a sentiment score to small social media posts. The second part of the notebook uses scikit-learn to train a model to perform sentiment analysis.
LDA Topic modeling – trains a Latent Dirichlet Allocation (LDA) modeling to find topics (groups of words that occur together).
Revised 2022-09-20
Questions? Contact us
License: CC BY 4.0.
Credit: "Vanderbilt Libraries Digital Lab - www.library.vanderbilt.edu"