CRIME-RECOMMENDATION
Focus: Will be to update and recreate a crime analysis using an Transformer for learning purposes
1. Overview
Previously, the data is colled from from the LACITY website, uploaded it to an AWS database via Planet Scale. Recently, new documentation allows us to use API to fetch data from the source and upload into a SQL database.
- Motivation: Why is this problem worth solving?
-
In recent years, I’ve noticed a rise in homelessness and perceived increases in crime across Los Angeles, motivating me to use my data analysis skills and firsthand perspective to investigate these trends. Following the pandemic’s forced shutdown of numerous businesses, my aim is to produce crime data visualizations that offer actionable insights for residents, tourists, and policymakers.
- Key Objective(s): What are the main questions or goals?
- Identify the correlation (if any) between the rise in homelessness and shifts in crime rates across different Los Angeles neighborhoods.
- Pinpoint the specific areas and types of crime that have seen the largest increases post-pandemic to guide data-driven policymaking and community safety efforts.
2. Table of Contents
- Overview
- Table of Contents
- Background and Context
- Data Description
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Modeling and Analysis
- Model Evaluation
- Results and Discussion
- Conclusion and Future Work
- References and Acknowledgments
3. Background and Context
Los Angeles has recently experienced a perceived rise in both crime and homelessness, prompting a closer examination of the city’s crime data. This repository uses publicly available data from the LACITY website to investigate patterns, trends, and policy implications. By focusing on key metrics and visualizations, the analysis aims to provide actionable insights for residents, policymakers, and other stakeholders. Click her for link to source.
- Literature Review: Research on crime mapping, such as that highlighted by the National Institute of Justice’s Mapping Crime: Understanding Hot Spots, shows how advanced spatial analyses can reveal nuanced relationships among homelessness, policy changes, and local crime rates. Drawing on publicly available data—like the LACITY crime dataset—enables deeper explorations of hot spots, trends, and potential policy implications. Journals like Crime Science and Cartography and Geographic Information Science demonstrate how integrating open data with robust geospatial methods can generate actionable insights for policymakers and communities alike. Relevant Link for Mapping Crime. Relevant Link for Public Pyschiatric Services.
- Industry/Domain Context: While I am not a specialist in criminology or a mental health professional, my aim is to create visually compelling representations that highlight how poverty and crime have evolved over time. This project provides informative graphics for public awareness, rather than offering a substitute for expert psychiatric or medical advice. By examining different zones within Los Angeles County, the study strives to emphasize the tangible impact of crime on local communities.
- Hypothesis/Research Questions:
- How have crime patterns evolved over time in Los Angeles County from 2020 to the present?
- Which areas can be identified as “hot zones,” and what underlying factors contribute to these concentrations?
4. Data Description

- Data Source(s)
- The primary dataset is from the LA City Crime Data (2020–Present) repository.
- Additional socioeconomic or demographic data (if applicable) may be sourced from complementary public platforms, such as the U.S. Census Bureau.
- Data Format
- The core crime dataset is available in CSV format, which can be easily imported into a variety of analytical tools and relational databases.
- Any supplementary datasets may come in JSON, Excel, or API-based formats, depending on the source.
- Data Fields
- DR Number: A unique identifier for each reported incident.
- Date/Time: Specifies when the crime occurred and allows for temporal trend analysis.
- Location: Includes address or coordinates (Latitude, Longitude) crucial for spatial mapping.
- Crime Classification: Type or category of the crime (e.g., Burglary, Assault).
- Reporting District: Police reporting area code, enabling analysis at the local precinct level.
- Victim Age/Gender/Race (if available): Demographic details that help understand victim profiles.
- Weapon Used (if available): Indicates whether a weapon was involved, aiding severity assessments.
- Data Size
- The crime dataset is continually updated, leading to frequent changes in row counts. As of the latest retrieval, it includes tens of thousands of records spanning multiple crime categories.
- With multiple fields in each record, the memory footprint can range from a few megabytes to larger, depending on the inclusion of geospatial data and historical depth.
- Potential Limitations
- Reporting Bias: Certain crimes may be underreported, skewing perceived distribution and severity.
- Missing or Incomplete Fields: Key attributes (e.g., demographic info, weapon usage) may not always be reported.
- Contextual Factors: Data does not directly include socioeconomic or policy-related variables, which could be vital for interpreting crime trends.
- Temporal Gaps: Depending on data collection intervals, some incidents may take time to appear in the dataset, potentially impacting real-time analyses.
5. Exploratory Data Analysis (EDA)
Use visuals and statistics to explore trends, patterns, and relationships. Summarize the key takeaways.
- Initial Observations: Summary statistics (means, medians, standard deviations).
- Distribution Plots/Graphs: Histograms, density plots, or bar charts.
- Correlation Analysis: Correlation matrix or pairplots.
- Target Variable Analysis: How does the target variable behave?
6. Data Preprocessing
Explain your cleaning, transformation, and feature engineering processes.
- Data Cleaning: Handling outliers, missing values, and inconsistencies.
- Feature Engineering: Creating new features, transforming existing ones, or removing irrelevant features.
- Data Splitting: Train-test splits, cross-validation strategy if applicable.
7. Modeling and Analysis
Discuss the algorithms or techniques applied and why you chose them.
- Model/Algorithm Choices: Linear Regression, Decision Trees, Neural Networks, etc.
- Hyperparameter Tuning: Ranges or search methods used.
- Implementation Details: Libraries, frameworks, and custom functions.
8. Model Evaluation
Evaluate model performance using appropriate metrics and validation methods.
- Metrics: Accuracy, Precision, Recall, F1 score, R-squared, RMSE, etc.
- Validation Approach: Cross-validation, hold-out set, or separate validation dataset.
- Comparison of Models: If multiple models were tested, compare their results side-by-side.
9. Results and Discussion
Present key findings, using tables and figures to illustrate model performance.
- Summary of Results: Main quantitative and qualitative findings.
- Interpretation: What do the results mean in the context of your initial objectives?
- Strengths and Weaknesses: Highlight where the approach did well, as well as any deficiencies or limitations.
10. Conclusion and Future Work
Summarize the insights and recommend next steps.
- Conclusion: How well did you address the problem or objectives?
- Limitations: Any constraints that limited your analysis or the generalizability of your results.
- Future Work: What improvements or extensions would you pursue next?
DASHBOARD

11. References and Acknowledgments
List relevant references and any resources that aided the project.
- References: Papers, articles, or other material that supported your research.
- Acknowledgments: Individuals or organizations that provided assistance, data, or input.