Exploring and visualizing spatial effects and patterns in ridesourcing trip demand and characteristics

: The complex demand pattern of ride-sourcing remains to be a challenge to transportation modeling practitioners due to the infancy and the inherently dynamic nature of the ride-sourcing system. Spatial effects exploration and analysis protocols can provide informative insights on the underlying structure of demand and trip characteristics. Those protocols can be thought of as an opportunistic strategy to alleviate the complexity and help specifying the appropriate econometric models for the system. Spatial effects exploration is comparable to point pattern analysis, in which, signals from spatial entities, like census tracts, can be analyzed statistically to reveal whether a specific phenomenon respective signal distribution is a completely random process or if it follows some regular pattern. The results of such analysis help to explore the investigated phenomenon and conceptualize its causal forces. In this paper, we apply spatial pattern analysis edge methods integrated into a visual analytics framework to: (1) test the null hypothesis of system demand complete randomness; (2) further analyze and explain this demand in terms of the origin-destination (OD) flow and trips characteristics, i.e., length and duration; and (3) underlying complexities in a streamlined workflow. The ridesourcing demand hotspots were explored and identified in the city’s central business district. A novel method to capture and analyze the origin-destination flowlines was developed and implemented. Finally, a complementary trip characteristics pattern analysis was conducted to fully comprehend the system and validate the findings from the system demand points and OD-flowlines. census tracts of Chicago’s CBD and its surroundings, represented in the significantly identified hotspots of pick-ups and drop-offs, along with the hot OD-flowlines concentrated in the CBD and vanishing toward the peripheral areas, should be further explored in predictive analytics framework on ridesourcing spatially contextualized pick-ups and drop-offs, as well as city-wide OD-flow modeling. As for the Local Anselin Moran’s I implementation on the clusters and outliers of trip length and duration characteristics, the results flag another key finding on the significantly shorter trips associated with that hot demand in the CBD, and longer trips, significant too, concentrated in the less demanding peripheral census tracts. Those findings on the system trip ends, and OD flow provide a clear image on the existence of spatial effects in the ride-sourcing system from a demand perspective as well as trip characteristics and would require further analysis to explore whether such pattern exists in willingness-to-share and mile-price. Moreover, the findings of this paper can typically be further utilized to fill two current practice and research gaps regarding: (1) standardizing ride-souring trip data handling protocols; and (2) incorporating spatial effects into agent-based modeling frameworks of ride-sourcing systems. The latter is considered in the critical component of social networking and interaction present in agent-based models, and their pertinence to spatial effects.


Introduction
The transportation market is no exception to the evolving trend of sharing economy, as the relatively newly introduced family of transportation services known as shared mobility has witnessed a wide acceptance and adoption among users, especially in highly urbanized areas and metro cities (Roukouni & Homem de Almeida Correia, 2020). Amongst the controversial modes of shared mobility is ride-sourcing, which enables users to have access to the mobility-as-a-service model on an as-needed basis (Shaheen, Chan, Bansal, & Cohen, 2015). Ride-sourcing transportation services, provided by companies like Uber and Lyft, also known as Transportation Network Companies (TNCs), are considered a revolutionary trend in mobility and urban travel, involving a high level of real-time computations and matching algorithms, equipped with seamless payment and surge-pricing, solving a good number of mobility and travel problems (e.g., first and last trip mile). The service runs on top of a set of system enablers, including smartphone application that pairs riders and community drivers (in non-commercial vehicles) given their spatial and temporal constraints, online transactions (e-payment) to guarantee seamless payment method, and a cloud-based surge-pricing scheme matching the supply with the demand (Shaheen et al., 2015). The service is either offered in the form of sequential trips, i.e., serving individual riders, or concurrently to several riders in a ride-and fare-splitting fashion (e.g., Uber Pool and Lyft Line).
Since the inauguration of Uber early in the past decade, the implications of the newly introduced shared, on-demand, and potentially automated mobility services are significant, especially in the way people make their choices on: what activities to do and where and when to do such activities. The demand for TNCs ride-sourcing services is growing exponentially, with evidence from Uber rides that reached 1-billion in approximately six years in late 2015, and only six more months to reach the 2-billion rides (Uber, 2017). This positions ride-sourcing as a game-changer in the transportation market. However, there is a lack of information about the impact such smart mobility services will have on the planning, operation, and management of existing and future transportation networks. It is not yet made clear how people make their mode choice for ride-sourcing, nor whether this demand is induced (Rayle, Shaheen, & Chan, 2014) or shifted from other modes. In either case, research reveals that it has implications on congestion (Erhardt et al., 2019) and transit ridership (Erhardt et al., 2021). In their exploration of TNCs' ride-sourcing impact on congestion intensity and duration, a study (Diao, Kong, & Zhao, 2021) reported increases of 0.9% and 4.5%, respectively, in consistence with the findings of (Erhardt et al., 2019). A decline of 8.9% was reported as related to the ride-sourcing impact on transit ridership, again consistent with the finding in (Erhardt et al., 2021). Moreover, in (HENAO, 2017), the author drove for Uber and Lyft as part of the data collection course and concluded similar findings in terms of increased Vehicle Miles Traveled (VMT) and congestion.
Despite the relief TNCs offered to some accessibility and mobility chronic issues (e.g., impaired driving, first and last-mile connections to public transit, and late-night services), recently released reports and studies suggest that these newly introduced services accounted for approximately 50% of the increase in congestion in San Francisco between 2010 and 2016 (Castiglione et al., 2018). These figures on VMT and transit ridership can even get more significant when considering zero-occupied vehicles in an unmanned-autonomous-vehicles (UAV) ecosystem. Other research concluded that Uber boosts public transit ridership by five percent after two years of introduction in an average transit agency area (Hall, Palsson, & Price, 2018). Henao et al. (2019) (Henao, Marshall, & Janson, 2019) signal ‹ 8 › other impacts on travel behavior and parking demand, and transportation equity. Confirming the ridesourcing controversial nature as a disruptive transportation service with uncertain impacts, Jin et al., 2018(Jin, Kong, Wu, & Sui, 2018 perceive ride-sourcing as positively impacting the economy and as a chance rather than a threat to transit. While they remain uncertain of its impact on car ownership, energy consumption and greenhouse gas emissions, they affirm the exacerbated digital division and exclusion from an equity standpoint. Therefore, it was a non-trivial challenge when transportation scientists and practitioners were confronted with the newly introduced mode of TNCs ride-sourcing services with no sufficient data to integrate this mode in the current state of the practice and modeling framework. They relied on stateof-the-art conceptual and simulation-based frameworks to provide a preliminary understanding on the mode without resorting to real-world ride-sourcing trip data (F. Alemi & Rodier, 2016;Basu et al., 2018;Kelleny & Ishak, 2018;Winter, Cats, Martens, & van Arem, 2020). Other endeavors depend on data fusion approach between a relatively small ride-sourcing trip dataset or small to medium scale survey (Farzad Alemi, 2018;Hermawan, 2018) injected in the proposed modeling framework along with more generalized and more enormous datasets (Oviedo, Granada, & Perez-Jaramillo, 2020;Reck & Axhausen, 2020). Stated-preferences (Sperling, Young, Garikapati, Duvall, & Beck, 2019) and household travel revealed preferences surveys also contributed to form an initial understanding of ride-sourcing users' perceived utility (Azimi, Rahimi, Asgari, & Jin, 2020;Dias et al., 2017). The first real-world TNCs ridesourcing trip data was released in 2014 on Uber along with other for-hire vehicle pick-ups from New York City. The data was limited to pick-up locations only and obfuscated for privacy, yet it was useful to conceptualize analytical spatial and temporal methods (Correa, Xie, & Ozbay, 2017;Faghih, Shah, Wang, Safikhani, & Kamga, 2020) on the first exposure to real-world ride-sourcing trip data.
At this stage of the system's maturation of understanding, a crucial dimension remains underrated and unexplored, which is the system's underlying spatial characteristics and effects. Given the inherent similarities between ride-sourcing and futuristic shared unmanned autonomous vehicles (SUAV), this spatial exploration and segmentation is so pertinent to SUAV-hubs planning and deployment and is very timely. Research should provide answers to questions on: (1) how ride-sourcing trip demand can be analytically and spatiotemporally analyzed; (2) do the demand patterns portrait any forms of clusters or urban pockets, or they are completely random processes? and (3) do the trip characteristics constituting this demand exhibit also a specific pattern of clusters? Capturing those patterns and their spatial effects significance is pivotal to integrating the system into regional multimodal transportation modeling frameworks, and to providing a coherent understanding of the way the system's trip demand and characteristics are spatially being shaped and growing.
Developing such spatial effects exploration protocols, perceived as a research gap in ridesourcing, is essentially data-driven and would be meaningless if conducted without resorting to realworld data. Therefore, and to fill this gap on the ride-sourcing system research and analysis, we propose the spatial exploration and visualization framework developed in this paper, and we utilize the City of Chicago TNC trip dataset (City of Chicago, 2021) obtained at the census tracts level to showcase the workflow, results, and findings. The rest of this paper is organized in the following fashion to answer the afore-asked questions and attain those very objectives: (1) literature review on the spatial effects methods in different transportation modal domains; (2) case study data and the pipeline on preprocessing, handling and synthesizing their spatial structure and trend mining protocols; (3) methodological aspects of ride-sourcing spatial effects exploration proposed in this work; and finally (4) the implementation remarks and results. The conclusions section will set the future direction to advance this research.

Literature review
The term "spatial effects" refers to an observed pattern of dependence (or autocorrelation) in a spatial phenomenon (L. Anselin, 1988;Luc Anselin, 1999;Beron & Vijverberg, 2004). That heterogeneous pattern is typically governed by the underlying fabric, such as socioeconomic or demographic structure in urban phenomena. The integration of spatial effects in transportation planning and modeling remains to be an unbeaten track (Lopes, Brondino, & Silva, 2014), and a challenge to areas of advanced modeling, e.g., Spatial Agent-Based Models, in which contextualizing the spatial effect in the agents set of rules and relationships still remains a challenge (Heppenstall & Crooks, ‹ 9 › 2019), especially in shared mobility and ride-sourcing context (Kelleny & Ishak, 2018). Even for conventional transportation models, ecological fallacy and spatial dependence were amongst the key methodological issues discussed (Wang, Quddus, Ryley, Enoch, & Davison, 2012). Failing to detect, assess and account for those effects prior to modeling the phenomenon could lead to misspecified linear spatial regression models (Florax & Nijkamp, 2005). In a regression context, spatial heterogeneity is comparable to "non-constant error variances (heteroskedasticity)" (Luc Anselin, 1999). However, in an exploratory analysis context, as such in this research, it refers to agglomerations of spatial entities of phenomenal (high or low) values of the explored system, i.e., ride-sourcing pick-ups and drop-offs (Florax & Nijkamp, 2005).
Spatial effects exploration is comparable to point pattern analysis, in which, signals from spatial entities, like census tracts, can be analyzed statistically to reveal whether the signal distribution is a completely random process or if it follows some regular pattern (Bailey & Gatrell, 1995). The results of this analysis help to narrow down the investigated phenomenon "causal forces" (Boots & Getls, 1988). Correa et al. (2017) (Correa et al., 2017) is amongst the early endeavor to account for spatial effects in modeling ride-sourcing demand. Global Moran I (Moran, 1950) was used just to test for spatial dependence existence in the demand data of Uber and Taxi, which gave rise to adopting spatial lag and spatial error models along with linear models. However, no further exploration on such spatially dependent pattern was conducted in terms of highlighting the associated urban pockets or exploring the effects in trip characteristics. Yu and Peng (2019) (Yu & Peng, 2019) used the same Global Moran I metric on spatial autocorrelation to capture the heterogeneous pattern of ride-sourcing trip demand data prior to developing a Geographic Weighted Poisson Regression model. Soria and Stathopoulos (2021) (Soria & Stathopoulos, 2021), in their work on solo and pooled trips behavior modeling, implemented a spatial Durbin model to account for the so-called "spatial spillover", or the impact of neighboring entities induced into their ride-splitting behavior.
In a transit ridership context and using GPS trajectories and smart card data, a study (Tu et al., 2018) highlighted the limitations of ordinary least square (OLS) globally fitted regression models in capturing spatially variant (heteroscedastic) demand, and therefore, proposed a Geographically Weighted Regression model (GWR). However, the GWR implementation was justified only by the clustered pattern of the dependent variable and regressors through visuals and quantified by the GWR outperformance at the expense of the OLS global model. No statistical protocols were utilized to assert such a pattern. Also (Liu, Sun, Sun, & Gao, 2020) adopted the same GWR modeling approach to control for the spatial effects in some of the regressors. In this work, the urban pockets breeding such heterogeneity, referred to as hotspots, were explored by means of Kernel density estimation (KDE), since the trip data was delivered in GPS trajectories format. However, no further elaboration on the statistical significance of those hotspots was provided in this work.
In a rail trip demand setting, a study (Cordera, Sañudo, dell'Olio, & Ibeas, 2018) introduced the Spatial Filtering approach to control for spatial effects in the demand driving variables to two models, namely, Poisson regression and gravity, to estimate the demand distribution between rail stations. The spatial effects accounted for by the gravity model exhibited significantly better performance, yet the models' performance metrics were the only gauge on spatial dependence. In their "exploratory analysis on freight trip attraction" (Sánchez-Díaz, Holguín-Veras, & Wang, 2016), the authors utilized the traditional Global Moran's I referred to earlier, but with a localized implementation using Local Indicators of Spatial Association, also known as Local Anselin Moran's I (Luc Anselin, 1995) to reveal the pattern of association in the proposed freight model explanatory variables related clusters. The explorative framework for spatial effects then gave rise to adopting a spatial econometric model that outperformed the OLS fitted model.
The framework proposed in this paper makes two major contributions in the ride-sourcing research area: (1) conceptual contribution: on incorporating spatial effects and heterogeneity in trips data prior to model specification and development; (2) methodological contribution: on analyzing TNC trip data and providing guidance to researchers and practitioners on spatial effects exploration and identification in such type of data to integrate it into multimodal transportation frameworks. The implications of this research can be extended to informed policy-making, particularly congestion pricing. Identifying the demand hotspots and the heavily trafficked OD-pairs can help better implement measures like cordon and/or corridor-based congestion pricing mechanisms to regulate and mitigate the impacts of TNCs on congestion.

Research methods
The research framework presented in this paper is data-driven, i.e., we pair our proposed methodological aspects of ride-sourcing spatial effects exploration and identification with a case study to showcase the workflow, results, and findings. Therefore, it would be indispensable to discuss the background of the data we utilize in this section, along with the cleaning and rebuilding protocols. Thus, the first sub-section is organized to include: (1) trip ends, i.e., pick-up and drop-off, cleaning and preprocessing; (2) trip data spatial structure and pattern mining and exploration; and (3) origindestination flowlines cleaning and preprocessing. In the second sub-section, we focus on the development of the methods.

Trip data description, cleaning, and preprocessing
Starting from November 2018, all TNCs operating in the City of Chicago were required by ordinance to report all their trips regularly, and the TNC trip data has been made publicly available through the Chicago Data Portal and updated on a quarterly basis. We chose to work with the year 2019 data to avoid any potential flaws in data reporting or packaging at the start of the initiative in 2018, and not to delve into odd travel behavior associated with the 2020 COVID-19 pandemic hit. The 2019 data has more than 97 million trip records. The variables relevant to this research are summarized in Table  1, along with their type, description, and cleaning and rebuilding protocols, if any. Trips start or end in either of the airports' census tracts, O'Hare International Airport and Chicago Midway International Airport, are eliminated from this analysis, too, since the airports' demand is not within the scope of this study due to their distinctive nature. Trips with a length of more than 50 miles are excluded, since they are not compatible with the extent of the study area. If entry is missing along with the pickup census tract, the entire trip record is removed

Dropoff Centroid Location
Point (Geometry); the location of the center of the dropoff census tract (longitude, latitude) If entry is missing along with the dropoff census tract, the entire trip record is removed Source: (City of Chicago, 2021) The trip characteristics distribution depicted in Figure 1-a and Figure 1-b shows more dense likeability of short trips. However, erroneous observations appear to exist in the data, as can be seen in the studied relation between trips' lengths and durations as shown in Figure 1-c. Treating these erroneous values as outliers would result in losing a considerable portion of the data on trip characteristics. Moreover, intuitive scrutinizing of those outliers leads to a strong belief that they are the outcome of errors in measurements or handling. For example, some observations have the length of several miles, yet the duration is a fraction of the second. This is a typical case of data contamination, in which data reduction can be a good remedy (Serneels & Verdonck, 2008), especially with the low-dimensionality space dealt with in this section. Since we are attempting to reduce the dimensions of the trip length-duration data ‹ 11 › while refraining from distorting the variance in the pattern we are studying in the first place, the principal component analysis (PCA) can be a good candidate. The PCA has been a key player in this application area due to its protocol for minimizing the variance loss (Jolliffe & Cadima, 2016). A key concern for applying PCA to the trip characteristics is the impact of the outliers on the quality of the process as discussed in (Sapra, 2010), for which robust variants of the PCA were proposed and developed. This concern is magnified in high-dimensional applications, which is not typically the case tackled here. Moreover, the two features reduced here can be readily assumed to be correlated, and therefore, the impact of the outliers would be mitigated by the contribution of the common surface from correlation. Therefore, a classic PCA is experimented to reduce the trip lengths and durations into one component, while adopting the explained variance ratio as the determinant metric on the approach performance and validity. The trip characteristics data were not standardized prior to implementing the PCA since the PCA workflow in the exploited Python Machine Learning Library Scikit-learn (Pedregosa et al., 2011) already standardizes the data before decomposition. The resulting component is then scaled to the unit variance as this is found to be consistent with the downstream application of local Anselin's index. An explained variance ratio of 0.9999 is returned from the PCA, which indicates a minimal loss of variance in the decomposition, and the distribution of the trips' length-duration principal component (LDPC) is shown in Figure 1-d. The skewness in the distribution of the trips' LDPC suggests a higher frequency of shorter trips. The median, therefore, is chosen to represent the central location of the trips' LDPC of the analyzed census tracts. The geographical context of the trips' median LDPC is shown in Figure 2. The LDPC median range expands from negative fractional values representing very short trips in terms of length and duration to positive values greater than 1 for relatively longer trips.

Trip Data Spatial Structure and Trends Mining
The empirical spatial means (ESM) for the daily pick-ups and drop-offs, μ P,S (S i ) and μ D,S (S i ), respectively as in (1 and (2, reveal the dominant magnitude of demand across the investigated geography during the analysis period. The denominator in the ESM equations, T, equals the number of days in which the corresponding census tract has pick-ups or drop-offs more than zero. As shown in Figure 3, daily pick-ups and drop-offs exhibit similar pattern of magnitude concentrated in and around the central business district (CBD) census tracts and vanishing in the direction approaching the peripheries. It should be denoted that ESM would not necessarily indicate a statistically significant hotspot, which is an exercise that should be worked out on its own, as will be shown in the next section. The demand patterns revealed by means of the ESM may suggest that ride-sourcing system demand is not a perfectly random process. Analyzing the empirical spatiotemporal covariance (ESTC) can provide a preliminary understanding in this regard. ESTC between two census tracts, S i and S j , is the covariance between the demand time series in both geographies. This ESTC matrix for pick-ups and drop-offs can be constructed as shown in (3 and (4, respectively. A pairwise census tracts' Euclidean Distance (ED) Matrix is constructed to study the interdependencies between the census tracts' demand co-variability, i.e., ESTC and the respective ED,. . One can notice an exponentially decaying trend between the ESTC and the ED shown in Figure 4, which may again suggest for spatial effects in the system.

OD Flow Data Cleaning and Preprocessing
The OD flow data had to undergo some cleaning and preprocessing protocols so that their magnitude could be analyzed and the underlying spatial effects could be explored and evaluated statistically. The study area comprised initially 638,401 OD-pairs, excluding the two airports census tracts. They are visualized by means of the total year trips encoded to the respective flowline in Figure  5 and weighted by 2x10 -5 to reduce the visual clutter and emphasize the heavily trafficked OD flowlines. The trip ends of the census tracts' internal trips, i.e., trips that started and ended within the same census tracts and ended up having both trip ends assigned to the census tract centroid, were assigned to two random geolocations within the respective census tract polygon (see Handling Internal Trips in Figure  5(a)). Thus, heavily trafficked internal OD-flowlines would be captured in this analysis too. Figure 5(b) depicts high OD intensity near the CBD area.
Only 25% of the aggregated OD pairs accumulated more than 17 trips during the entire analysis year. Figure 6 offers a better depiction of the skewness and the kurtosis of the aggregated OD flow. This long-tailed thin distributional shape, along with the exponentially decaying trend of the relation between the flow and the Euclidean length of the OD-pairs depicted in Figure 7, both suggest a remarkable less likeability of flow between distant origin and destination census tracts. Therefore, to conduct a conclusive analysis on the magnitude of the OD flow between census tracts in the study area, an OD-flow sampling protocol is recommended in this study. We propose an arbitrary number of 2000 of aggregated trips as the threshold for OD-pairs to be further considered. After Filtering the data using that threshold, we remain only with 7142 OD-pairs.

ISSN 2520-2979
Journal of Sustainable Development of Transport and Logistics, 6(2), 2021 ‹ 15 › Despite the significant portion of the pairs masked by that threshold, this approach should be acceptable in the context of capturing and analyzing the magnitude of the OD flow between census tracts. Following this step, a mesh of cell size of approximately 300 x 300 feet is constructed to summarize and compute the density of the OD flow. This visual analytic method is inspired by the work proposed in (Wood, Dykes, & Slingsby, 2010), and known as cell-based symbolization, in which the local number of flows is captured within each cell of the mesh, enabling further analytics to be performed on the flow. To elaborate, the scope of the analysis is to capture spatial heterogeneity and spatial dependence in the pattern of the OD flow; thus, any loss in the granularity of the pairs' flow magnitude would hinder such attainment. The OD map visual analytic method allows for reducing the visual clutter as well as capturing the analytical component on the flow. However, amongst the limitations of the method is the inability to discern the direction of the flow, yet that is not a barrier in this spatial analysis while focusing on the heterogeneity and dependence of the flow magnitude.

Methodological aspects of ride-sourcing spatial effects exploration
We focus on two aspects of the underlying pattern: (1) hotspot areas; and (2) clusters and outliers. Hotspot analysis can provide key insights on ride-sourcing trip demand and OD-flow spatial heterogeneity with rigorous statistical tests on significance as developed by Getis and Ord (Getis & Ord, 2010) and known as "Getis-Ord Gi*" (Esri Inc., 2021b). Getis and Ord developed the basic statistic to test the spatial association between significantly high and low weighted values with respect to their spatial relations. The definition of this spatial weight incorporated in the test statistic is pivotal. Based on the findings from the visual representation of the ESTC and ED in the previous section, an inverse Euclidean squared distance between the census tracts' centroids is selected to tune the association tested. The Zvalues, essentially the standard deviation of the Getis-Ord Gi* (see (5) (Esri Inc., 2021b; Getis & Ord, 2010), test the null hypothesis that the i th tested spatial entity is not a significant hot or cold spot. High Z-values, i.e., above 1.65, 1.96, and 2.58, and their respective low p-values, i.e., 0.1, 0.05, and 0.01 will result in rejecting such hypothesis and identify the spatial entity as a hot or cold spot at confidence levels of 90%, 95%, and 99%, respectively. The signal x j is the pattern value in geography j; j ∈ {1, … n = 799} , namely the ESM μ P,S (S i ) for the daily pick-ups, and the ESM μ D,S (S i ) for drop-offs. As for the w i,j , it is the spatial weight; the inverse Euclidean distance squared between the geographies' centroids. It is noteworthy that the signal value of the target geography x i is not included in the global format of Getis-Ord General G (Getis & Ord, 2010). This absence of the target geography from the scoring process allows for capturing the spatial effect, if any, on the adjacent geographies in a presumably exponentially decaying weight, i.e., to evaluate the signal from the hypothetically significant cluster against the global signal value. This should be emphasized here because the approach adopted to analyze the pick-ups and drop-offs pattern, although local, is adapted from a global one to reveal the existence of significant value clusters. Whilst for the analysis that will be conducted on the trips' LDPC in a later section, an explicit local approach is adopted, in which the statistical test will be performed on a signal-to-signal basis.
The cluster and outlier analysis proposed for exploring the spatial effects in the trips' LDPC is predicated on Anselin Local Moran's I (Luc Anselin, 1995) and is proposed to reveal how the spatial pattern of trip characteristics complements the findings from the hotspot analysis and provides a full image on the system's spatial pattern. This cluster and outlier analysis approach should be distinguished ‹ 17 › from the hotspot analysis approach, though. Albeit both approaches evaluate local signals, the first one, i.e., Getis-Ord Gi* does that against the average global signal, whilst the Anselin's Local index looks into the way the signals vary from one geography to another. This is found to be more harmonious with the tested signal, i.e., the trip LDPC, to see how hot or cold spots, if any, constituted by urban pockets of trips, are spatially clustered by their LDPC. The test statistic, therefore, is not calculated directly for the signal, but rather each geography is assigned an index, that is, the Anselin Local Moran's I, derived from the signal (see (6) (Luc Anselin, 1995;Esri Inc., 2021c), which is thereafter used for the Z test statistic (see (7). The null hypothesis in this analysis is that the system is an outcome of a complete spatial random process.
Where: i, j ∈ {1, … n = 799}; The z-score: Where: E[I i ] is the mean and V[I i ] is the variance,computed as:

Implementation remarks, results and discussion
The proposed methods are implemented using the Hot Spot Analysis (Getis-Ord Gi*) module in ArcGIS Pro spatial statistics toolbox (Esri Inc., 2021b). The results on TNC ESM pick-ups and drop-offs shown in Figure 8 support the existence of a hotspot of ride-sourcing demand in the CBD area, consistent with the early findings from the visuals in Figure 3. Nearly all census tracts outside the CBD cordon exhibit non-significant demand patterns comparable to the magnitude and concentration of the CBD one. The existence of such a hotspot flags heterogeneity in ride-sourcing demand.

Figure 8: Daily Pick-ups and Drop-offs Hotspot Analysis Using Getis-Ord Gi* Statistics
The hotspot analysis on the OD-flow is conducted on the OD-summarizing cells explained earlier by means of the Hot Spot Analysis (Getis-Ord Gi*) module in ArcGIS Pro spatial statistics toolbox (Esri Inc., 2021b). The spatial relationship between the OD-summarizing cells is conceptualized as an exponentially decaying relation, i.e., inverse Euclidean distance squared, as it is elicited from the trend observed in Figure 7. The cell-based OD mapping process shown in Figure 9 (a) shows prevailing OD flow activities in the CBD and extended into the northwest census tracts. The OD hotspot analysis results shown in Figure 9 (b) reveals similar findings with the hotspot cells concentrated around the CBD and dissipating into thin traces toward the northwest.
Examined visually, the trips' LDPC pattern shown in Figure 2 suggests the earlier elicited hypothesis on the hot demand spots in the CBD area that they are driven by longer trips from peripheral census tracts and relatively shorter trips from the CBD and the adjacent census tracts. Yet this remains a hypothesis until tested using the robust Anselin Local Moran's I process on cluster and outlier analysis. The conceptualization of the spatial relationship in this analysis is based on the inverse of the mere distance between the census tracts, unlike the inverse squared distance adopted in the hotspot analysis on pick-ups and drop-offs and the subsequent analysis on OD-flow, for which, there was a strong belief supported by evidence on an exponentially decaying trend with distance.
From the input signal on trips' LDPC, the Anselin Local Moran's I, and the respective Z-scores are calculated as explained earlier. The Z-score and p-value reveal the statistical significance in the context of the hypothesis tested. Higher Z-scores and p-values indicate the presence of spatial association of high or low signals and will lead to rejecting the null hypothesis, and the outcome on the cluster type will label the statistically significant geographies whether they are significant in high or low signal magnitude. This elaborated workflow highlights why this analysis should better answer the question on the existence of spatial effects in trip characteristics than other global or semi-global approaches, e.g., Getis-Ord Gi*, due to the locality of the test statistic of location-by-location basis. Thus, a propagation pattern from the CBD on trip characteristics, if existing, can be observed and highlighted.

b) OD-Pairs-Flow Hot Spots
The cluster and outlier analysis process was conducted using the ArcGIS Pro module on cluster and outlier analysis (Esri Inc., 2021a). The null hypothesis presumes the complete spatial random property of the system's examined signal, in this case, the trips' LDPC. Thus, the pattern of the census tracts trips' LDPC, and consequently the Anselin's Local I, is rearranged permutationally around each signal location to test the null hypothesis on complete randomness against the hypothesis on pronounced cluster pattern. In this sequence, a pseudo p-value instead of the conventional p-value is the basis for statistical significance. A pseudo p-value is the proportion of the random permutations in which the respective Anselin's Local Moran-I reflected a more pronounced clustering pattern than the real I. To elaborate, in a randomization process that comprises N-permutations, each census tract will develop a random distribution of Local Moran-I consisting of N-random variables computed during the permutations, and this will result in a "conditional permutation at each location" (Luc Anselin, 2016). The real I, obtained from the actual pattern, will then be compared against each conditional permutation I, using the Z I i , and in every time Z I i n from a permutation n ∈ N exhibits a more significant clustering pattern than Z I i for the real I i , this will count toward the pseudo p-value proportion. Lastly, at a confidence level of 95%, if pseudo p-value is less than 0.05, we reject the null hypothesis on complete randomness.
The precision of the experiment in accepting or rejecting the hypothesis is governed by the number of the permutations performed. Noting that the entire dependence upon the analytical approximation of the index (Local Moran-I) is expected to inferentially underperform (Luc Anselin, 2016), i.e., ill-conceived hypotheses. Thus, an arbitrary number of 799 permutations, equals to the number of census tracts in the study area, is chosen to conduct the analysis. However, testing multiple hypotheses raises concerns on alpha-error inflation, i.e., false-positive or type-I error. Therefore, a false discovery rate (FDR) correction is adopted, as developed in (Benjamini & Hochberg, 1995) to control for the false positives.
The cluster and outlier analysis results in five types of clusters and outliers, as shown in Figure  10, and can be interpreted as follows: 1) High-High Cluster: a statistically significant cluster of pronounced high signal values, i.e., ‹ 20 › census tracts with remarkably higher trips' length-duration median component 2) High-Low Outlier: census tracts that have their trips' length-duration component comparatively low but fall within the significantly identified High-High cluster 3) Low-Low Cluster: statistically significant cluster of pronounced small signal values, i.e., census tracts with remarkably smaller trips' length-duration median component 4) Low-High Outlier: census tracts that have their trips' length-duration component comparatively high but fall within the significantly identified Low-Low cluster 5) Not Significant: census tracts with their trips' length-duration component median statistically neither significant high nor significant low and found to be unidentifiable in terms of signal value

Figure 10: Cluster and Outlier Analysis of TNC Trips' Length-Duration Principal Component
The results align exactly with the initial observation made on the short trips concentrated in and around the CBD, and longer trips in the peripheral census tracts. There is a clear buffer of not significant census tracts separating the two extremely significant clusters, with a trivial number of outliers as the exception that proves the rule.

Conclusions
The workflow of the analysis conducted in this paper adopted two spatial statistical methods to reveal the underlying effects in ride-sourcing trip demand, OD-flow, and trip characteristics; (1) Getis-Ord Gi*; and (2) Local Anselin Moran's I, in their respective domain of implementation. The key findings from the Getis-Ord Gi* implementation on trip ends and OD-flow hotspot analysis can be summarized as follows: (1) TNC Trip Ends, i.e., pick-ups and drop-offs, both show spatial heterogeneity of statistically significant high rates concentrated within and around the CBD area; (2) TNC OD-flow analysis shows a strong exponentially decaying trend between the OD flow and the Euclidean distance (OD length); and (3) TNC OD-pairs show spatial heterogeneity, statistically significant too, with a larger footprint extending from the CBD toward the northwest census tracts. That agglomerations of demand in the ‹ 21 › census tracts of Chicago's CBD and its surroundings, represented in the significantly identified hotspots of pick-ups and drop-offs, along with the hot OD-flowlines concentrated in the CBD and vanishing toward the peripheral areas, should be further explored in predictive analytics framework on ridesourcing spatially contextualized pick-ups and drop-offs, as well as city-wide OD-flow modeling.
As for the Local Anselin Moran's I implementation on the clusters and outliers of trip length and duration characteristics, the results flag another key finding on the significantly shorter trips associated with that hot demand in the CBD, and longer trips, significant too, concentrated in the less demanding peripheral census tracts. Those findings on the system trip ends, and OD flow provide a clear image on the existence of spatial effects in the ride-sourcing system from a demand perspective as well as trip characteristics and would require further analysis to explore whether such pattern exists in willingnessto-share and mile-price. Moreover, the findings of this paper can typically be further utilized to fill two current practice and research gaps regarding: (1) standardizing ride-souring trip data handling protocols; and (2) incorporating spatial effects into agent-based modeling frameworks of ride-sourcing systems. The latter is considered in the critical component of social networking and interaction present in agent-based models, and their pertinence to spatial effects.

Conflicts of interest/Competing interests
Not applicable