This project proposes an analytical approach to exploring salary data for different job titles in the United States using data visualization tools such as Power BI. The data analyzed in this project is limited to 2022, and was collected from the source h1bdata.info using Python. The objective of the project is to gain valuable insights into the current job market and assist students and employees in making informed career decisions. The project is divided into 3 dashboards, each of which focuses on a specific aspect of job market analysis. Each dashboard will contain 3 to 4 graphs, maps, or pie charts, providing a visual representation of the salary data. Through a thorough examination of the data, we aim to answer questions regarding the average salaries for various job titles, how salaries vary by location, and the growth trends of specific job markets.

Note: Importing necessary libraries.

import pandas as pd
from bs4 import BeautifulSoup
import requests

Note: Scraping h1bdata website.
url: https://h1bdata.info

# Create an empty DataFrame
df = pd.DataFrame(columns=['employer', 'job_title', 'salary', 'location', 'submit_date', 'start_date'])


# Setup of Source Data https://h1bdata.info/index.php?year={year}&job={job}
base_url = "https://h1bdata.info/index.php?"
job_titles = ["data+scientist", "senior+data+scientist", "data+analyst", "Big+Data", "machine+learning+engineer", "business+analyst", "Database+Engineer", "Quality+Assurance", "analyst", "qa+analyst", "quality+engineer", "data+engineer", "qa+engineer", "data+warehouse+specialist", "senior+analyst", "quality+analyst", "associate+data+scientist"]

for job in job_titles:
    for year in range(2018,2023):
        url = f"{base_url}&year={year}&job={job}"
        response = requests.get(url)
        content = response.content
        soup = BeautifulSoup(content, 'html.parser')
        # Find the table with class 'tablesorter tablesorter-blue hasStickyHeaders'
        table = soup.find('table', {'class': 'tablesorter tablesorter-blue hasStickyHeaders'})
        tbody = table.find('tbody')
        if tbody:
            # Find all rows in the table
            rows = tbody.find_all('tr')
            # Loop through each row
            for row in rows:    
                # Find all cells in the row
                cells = row.find_all(['td'])
                if len(cells) >= 6:
                    # Extract the text from each cell
                    employer = cells[0].text
                    job_title = cells[1].text
                    salary = cells[2].text
                    location = cells[3].text
                    submit_date = cells[4].text
                    start_date = cells[5].text
                else:
                    employer = 'NA'
                    job_title = 'NA'
                    salary = 'NA'
                    location = 'NA'
                    submit_date = 'NA'
                    start_date = 'NA'
                # Append the data to the DataFrame
                df_new_row = pd.DataFrame({'employer': [employer], 'job_title': [job_title], 'salary': [salary], 'location': [location], 'submit_date': [submit_date], 'start_date': [start_date]})
                df = pd.concat([df, df_new_row])

df.head(3)

Note: Result.

  employer job_title salary location submit_date start_date
1 FORTRESS INFORMATION SECURITY LLC DATA SCIENTIST 45,980 ORLANDO, FL 04/09/2020 10/01/2020
2 PERCOLATA CORPORATION DATA SCIENTIST 46,060 PALO ALTO, CA 03/18/2016 09/02/2016
3 MY LIFE REGISTRY LLC DATA SCIENTIST 47,960 FORT LEE, NJ 02/18/2015 08/20/2015

Note: Exporting Data for Analysis with Power BI.

# Save file
from openpyxl import Workbook
df.to_excel("h1b_salaries.xlsx", index=False)

Note: Power BI Analysis.

In conclusion, this insightful data-driven study sheds light on the current state of the H1B visa job market in the United States, with a focus on four critical job categories: Data Science, Business Analyst, Big Data, and Quality Analyst. This project has the goal of providing valuable information and guidance to students and employees looking to make informed career decisions. Utilizing data visualization tools, the study presents a clear and concise representation of salary data analyzed from h1bdata.info using Python. This study addresses crucial questions regarding average salaries for different job titles, salary variations by location, and growth trends in specific job markets. The findings of this project will remain a valuable resource as the job market evolves and changes over time, making it a must-read for anyone pursuing a career in the United States.