Stories by Emre Havan on Medium

Building a Scalable Apple Health Authorization Management View for iOS

Emre Havan — Tue, 09 Jan 2024 08:20:36 GMT

It usually seems pretty easy to ask for authorization for Apple Health from our iOS apps, we just need to call the requestAuthorization method, and if the user has not yet authorized or made a decision, they will see the authorization sheet, and if they had, they will see nothing. Well yeah, it’s pretty easy, but building a better experience for users, where they can see, in detail, whether they have authorized, or what data types they have authorized, or if there are new data types the app needs additional permissions for, can be a challenging and complicated task.

In this piece, we are going to build the same experience we have build for Fit Records. Lets get started!

Disclaimer: In some parts of the code, we could have also used the new Swift Concurrency and also Dependency Injection for testability. But these are omitted for this article since it is focused on HealthKit Integration

Getting To Know Data Types and APIs

First, let’s identify the data types we are going to work with and take a look at the HealthKit APIs we are going to use.

In this example, we are going to request authorization for writing and reading data as follows:

Write Data Types: Active Calories Burned, Workouts

Read Data Types: Active Calories Burned, Heart Rate

Now the HealthKit APIs we will use:

requestAuthorization(toShare:read:completion:): A method to request authorization for given write and read types, if user has not seen the permission view for the given types, the system presents the permission view, otherwise it calls the completion of the method directly.
getRequestStatusForAuthorization(toShare:read:completion:): A method to understand whether the user has seen the permission view for the given write and read types. This will help us to show an authorize additional permissions button when we introduce new types in the future.
authorizationStatus(for:): A method to check the authorization status for the given health type. IMPORTANT! It is not possible to figure out whether user has granted permission for reading a certain data type, it only works for checking the status for writing a certain data type.

Now that we’ve reviewed the data types and the important APIs, we can get started with building our view.

Building the View

We are going to build our view using SwiftUI, but before we get started with building the view, it is important to first define what we want to achieve, then build the viewModel (ObservableObject) which will be responsible for managing the view state for different HealthKit authorization states.

What we want to achieve

https://medium.com/media/dc08c3f213f2e11c9acf296633aae7f5/href

We want to provide an informative Apple Health Integration Management View for our users, and it can be best described with its different states.

User has not seen the authorization view: In this state, the user has not yet made a decision on authorizing any health data, we want to show them a button to trigger the authorization flow
User has denied authorization for all data types: In this state, the user has denied providing access to any health data, we want to show that the Health is not integrated
User has denied authorization for all write data types, but authorized for some read data types: Since there is no way for us to understand if the user has granted access for a read type, if the user has denied access for all write types, we will assume the Health is not integrated, and the UI will be the same as the state above
User has granted access for all or some write data types: In this state, we want to show users that the integration is active and indicate the authorization status for each write data type
User has granted access for all or some write data types, but the app needs additional permission for a new write data type: In this state, we want to show users that the integration is active, indicating the authorization status for determined data types, and show that new data types are available for additional access, and a button to reauthorize the Health.
User has granted access for all or some write data types, but the app needs additional permission for a new read data type: The state is the same as above, but this time the app introduced a new read data type, since we cannot know if the user has granted permission for a read type, we only want to show a button to reauthorize the Health, without showing the status for this new read data type

Phew, thats a lot of states, and we want to cover all of them in our UI.

We better get to work!

Implementing the View Model

We are going to implement an observable view model that will manage all the logic around different states. Before we get started with it though, let’s implement an enum to describe all the possible states we described above.

We are actually going to need two enums, one for describing the write data types we are interested in, and the second one to manage the state of the authorization status.

First, AppleHealthWriteDataTypes for describing the write data types we will use:

enum AppleHealthWriteDataTypes: Hashable, Identifiable {
    var id: Self {
        return self
    }
    
    case activeCaloriesBurned(HKAuthorizationStatus)
    case workout(HKAuthorizationStatus)
}

The cases also have an associated value of type HKAuthorizationStatus it will be helpful for us to indicate whether a data type is authorized or not, later on when we draw the UI. Also, it conforms to Hashable and Identifiable so we can use it in a ForEach .

Now the main enum to manage all the state, HealthKitIntegrationState :

enum HealthKitIntegrationState {
    case healthDataNotAvailable
    case loading
    case notDetermined
    case determined([AppleHealthWriteDataTypes])
    case partiallyDetermined(determinedWriteTypes: [AppleHealthWriteDataTypes], nonDeterminedWriteTypes: [AppleHealthWriteDataTypes])
}

Lets go over them one by one:

healthDataNotAvailable: HealthKit is not available for older iPads and Macbooks (partially), so depending on the user’s device, the health data may not be available. Learn more at isHealthDataAvailable
loading: This is the state we will set initially, until we get more information about user’s apple health state
notDetermined: User has not made a decision yet
determined: The Apple Health state is determined
partiallyDetermined: User previously determined the state for data types, but the app introduced new data types that require additional determination

Some Helpers

We are also going to implement some helper entities to get the read and write types we use, request authorization and check authorization status for given types.

AppleHealthUsedDataTypeProvider

A simple enum with two static functions to provide read and write data types for HealthKit

enum AppleHealthUsedDataTypeProvider {
    static func provideReadTypes() -> Set? {
        guard HKHealthStore.isHealthDataAvailable(),
              let activeCaloriesBurned = HKObjectType.quantityType(forIdentifier: .activeEnergyBurned),
              let heartRate = HKObjectType.quantityType(forIdentifier: .heartRate) else {
            return nil
        }
        return [activeCaloriesBurned, heartRate]
    }

    static func provideWriteTypes() -> Set? {
        guard HKHealthStore.isHealthDataAvailable(),
              let activeCaloriesBurned = HKObjectType.quantityType(forIdentifier: .activeEnergyBurned) else {
            return nil
        }
        return [activeCaloriesBurned, .workoutType()]
    }
}

AppleHealthAuthorisationRequester

Another simple enum with two methods to request authorization and get the authorization status from HealthKit

enum AppleHealthAuthorisationRequester {
    static func requestAuthorisation(onCompletion: @escaping () -> Void) {
        guard let writeDataTypes = AppleHealthUsedDataTypeProvider.provideWriteTypes(),
              let readDataTypes = AppleHealthUsedDataTypeProvider.provideReadTypes() else {
            DispatchQueue.main.async {
                onCompletion()
            }
            return
        }

        HKHealthStore().requestAuthorization(toShare: writeDataTypes, read: readDataTypes) { success, error in
            DispatchQueue.main.async {
                onCompletion()
            }
        }
    }

    static func requestStatusForAuthorisation(onCompletion: @escaping (HKAuthorizationRequestStatus) -> Void) {
        guard let writeDataTypes = AppleHealthUsedDataTypeProvider.provideWriteTypes(),
              let readDataTypes = AppleHealthUsedDataTypeProvider.provideReadTypes() else {
            DispatchQueue.main.async {
                onCompletion(.unknown)
            }
            return
        }

        HKHealthStore().getRequestStatusForAuthorization(toShare: writeDataTypes, read: readDataTypes) { status, error in
            DispatchQueue.main.async {
                onCompletion(status)
            }
        }
    }
}

Great, now we can finally get started with implementing the view model! Enter, AppleHealthIntegrationViewModel

final class AppleHealthIntegrationViewModel: ObservableObject {
    @Published var state: HealthKitIntegrationState = .loading

    init() {
        getAuthorisationStatusForAppleHealthDataTypes()
    }

    private func getAuthorisationStatusForAppleHealthDataTypes() {
        // Will be implemented soon
    }
}

For now its a simple view model with a published state property set to .loading at first, and a method called right after initialisation to get the authorization status.

Computing the HealthKit Authorization State

Now we are going to implement the body of getAuthorisationStatusForAppleHealthDataTypes to check the authorization status and update our state properly, with the needed associated values, but before doing that, we need to add a new property to AppleHealthWriteDataTypes we implemented earlier, to make our lives easier later on:

extension AppleHealthWriteDataTypes {
    var isDetermined: Bool {
        switch self {
        case .activeCaloriesBurned(let authStatus),
                .workout(let authStatus):
            return authStatus != .notDetermined
        }
    }
}

isDetermined will be used to identify the determined types easily.

Okay now we can get back to implementing getAuthorisationStatusForAppleHealthDataTypes in the view model:

private func getAuthorisationStatusForAppleHealthDataTypes() {
    guard let activeCaloriesBurned = HKObjectType.quantityType(forIdentifier: .activeEnergyBurned) else {
        state = .notDetermined
        return
    }

    let healthStore = HKHealthStore()

    // 2
    // Workout Write Type
    let workoutAuthorisationStatus = healthStore.authorizationStatus(for: .workoutType())
    let workoutType: AppleHealthWriteDataTypes = .workout(workoutAuthorisationStatus)

    // Active Calories Write Type
    let activeCaloriesBurnedAuthorisationStatus = healthStore.authorizationStatus(for: activeCaloriesBurned)
    let activeCaloriesBurnedType: AppleHealthWriteDataTypes = .activeCaloriesBurned(activeCaloriesBurnedAuthorisationStatus)
    
    // 3
    let appleHealthWriteTypes = [workoutType, activeCaloriesBurnedType]
    let determinedWriteHealthTypes = appleHealthWriteTypes.filter { $0.isDetermined }
    let nonDeterminedWriteHealthTypes = appleHealthWriteTypes.filter { $0.isDetermined == false }

    // 4
    AppleHealthAuthorisationRequester.requestStatusForAuthorisation { status in
        self.updateState(
            requestStatusForAuthorisation: status,
            nonDeterminedWriteHealthTypes: nonDeterminedWriteHealthTypes,
            determinedWriteHealthTypes: determinedWriteHealthTypes,
            appleHealthWriteTypes: appleHealthWriteTypes
        )
    }
}

private func updateState(
    requestStatusForAuthorisation: HKAuthorizationRequestStatus,
    nonDeterminedWriteHealthTypes: [AppleHealthWriteDataTypes],
    determinedWriteHealthTypes: [AppleHealthWriteDataTypes],
    appleHealthWriteTypes: [AppleHealthWriteDataTypes]
) {
    // 5
    switch requestStatusForAuthorisation {
    case .unknown:
        // 6
        state = .healthDataNotAvailable
    case .unnecessary:
        // 7
        state = .determined(determinedWriteHealthTypes)
    case .shouldRequest:
        // 8
        if determinedWriteHealthTypes.count == 0 {
            state = .notDetermined
        } else {
            state = .partiallyDetermined(determinedWriteTypes: determinedWriteHealthTypes, nonDeterminedWriteTypes: nonDeterminedWriteHealthTypes)
        }
    }
}

Lets see what we do now step by step as indicated by the numbered comments:

1) First we declare a HKQuantityType for the active calories burned, and declare a HKHealthStore
2) Then we get the authorization status for our write types, and then set our custom types for further processing. You might be wondering, why we need an additional enum, and why we don’t just work with HKSampleTypes provided with HealthKit? It’s because HKSampleType is not an enum to easily identify what kind of sample type that is. Thus, we map them to our own health data type, AppleHealthWriteDataTypes we created earlier.
3) Then we declare three different arrays, first, appleHealthWriteTypes that includes all the data types we want to write, then we declare determinedWriteHealthTypes, including only the types the user has already seen and determined its state (authorized or denied), by filtering the first array with recently created isDetermined value. Then the final one, nonDeterminedWriteHealthTypes, including the ones are not yet determined, meaning the user didn’t make a decision for those types yet.
4) Then we ask the request status for authorization with our helper AppleHealthAuthorisationRequester and pass the status of type HKAuthorizationRequestStatus to updateState method, along with all the arrays we have declared earlier.
5) In the updateState method, we switch on the requestStatusForAuthorisation
6) If the case is .unknown, that means an error occurred during the retrieval of the auth status, then we can set our view’s state to healthDataNotAvailable (You could alternatively implement a separate state for this to show a different error, but we used the same state for when the health data is not available on the device)
7) If the case is .unnecessary , that means for the given read and write types, the user has already made a decision, so we can set our state to .determined, with the determinedWriteTypes we have created earlier.
8) Finally for the case .shouldRequest, we need to do a bit more, to understand, if we will request authorization for the first time, or if we did request authorization in the past, but now we need it again because we introduced additional data types at a later version. We understand it by looking at the determinedWriteHealthTypes count, if its greater than 0, that means there were some write types determined earlier, meaning we need to request additional health types, so we set our state to .partiallyDetermined, by also passing the determinedWriteHealthTypes and nonDeterminedWriteHealthTypes. If the count is 0 though, we set our state to .notDetermined

With this logic in place, we can now cover all cases we wanted to achieve. Please note that, since there is no way to identify if a data type for reading values from is authorized, if your app is only concerned with reading data from HealthKit, this approach won’t work. Similarly, also for Fit Records, if the user only enables reading data but disables writing data to HealthKit, the UI will look as if the user has denied all access. This is something we are fine with, given that there is no additional API to verify the status for read types, and our app really needs the write authorization to properly implement HealthKit integration :)

Before we move onto the view, we will implement one more thing for our view model, as you will see in the view examples in the next section, user can jump to the Health App to do modifications, and come back to our app. Since we show detailed write type authorization status, we need to make sure our view won’t show an outdated state when user comes back, in case they make any changes.

We will achieve this by observing the willEnterForeground notification and recomputing our state as follows:

Inside the init we will add the view model as an observer for the UIApplication.willEnterForegroundNotification notification:

NotificationCenter.default.addObserver(
    self,
    selector: #selector(appWillEnterForeground),
    name: UIApplication.willEnterForegroundNotification,
    object: nil
)

Then we will also implement the appWillEnterForeground method where we trigger getting authorization status:

@objc
private func appWillEnterForeground() {
    getAuthorisationStatusForWriteTypes()
}

Thats it, now if user jumps to Health App and comes back to Fit Records, we will show the most up to date HealthKit integration state :)

Implementing the View

For brevity, we are not going to go into details of all the subviews, but we will briefly discuss their implementation details. Drop a comment if you would like another article showing the implementation details of the subviews though :)

Introducing AppleHealthIntegrationView

struct AppleHealthIntegrationView: View {
    
    @StateObject var viewModel: AppleHealthIntegrationViewModel
    
    var body: some View {
        VStack {
            VStack {
                switch viewModel.state {
                case .healthDataNotAvailable:
                    makeHealthDataNotAvailableView()
                case .loading:
                    ProgressView()
                case .notDetermined:
                    makeNonDeterminedView()
                case .determined(let determinedTypes):
                    makeDeterminedView(determinedTypes: determinedTypes)
                case .partiallyDetermined(let determinedTypes, let nonDeterminedTypes):
                    makePartiallyDeterminedView(determinedTypes: determinedTypes, nonDeterminedTypes: nonDeterminedTypes)
                }
            }
            .padding()
            .background(Color(uiColor: .systemGray6).cornerRadius(16.0))
            .padding()
            Spacer()
        }
    }
}

It is a simple view with one StateObject, the viewModel , then in its body, we switch on the viewModel’s state to draw our UI. Let’s discuss how and what we show for each state:

makeHealthDataNotAvailableView()

This view is shown when the health data is not available in user’s device, and it looks like the following:

makeNonDeterminedView()

This view is shown when the user has not interacted with Apple Health authorization yet, it shows some descriptive labels, and an Integrate button, triggering the authorization request to the system, by calling the previously implementedrequestAuthorisation method of AppleHealthAuthorisationRequester. It looks like the following:

makeDeterminedView(determinedTypes:)

This method takes an array of determined write types and provides one of the two views. If at least one of the write types is authorized, it shows the authorization status as following by using a ForEach for all the provided determined write types (Shows checkmark if authorized, and an xmark if denied):

But if the all the write types are denied, it shows the following view:

In order to easily identify whether at least one write type is authorized, we first need to add another extension to AppleHealthWriteDataTypes :

extension AppleHealthWriteDataTypes {
    var isAuthorised: Bool {
        switch self {
        case .activeCaloriesBurned(let authStatus),
                .workout(let authStatus):
            return authStatus == .sharingAuthorized
        }
    }
}

And in AppleHealthIntegrationViewModel we need to add an internal method for the view to interact with, and a private method to check authorization status for each determined write type (The view could actually compute this with the data provided from viewModel’s state, but we kept these methods in the view model to keep the view as logic free as possible):

func isAtLeastOneWriteTypeAuthorised() -> Bool {
    switch state {
    case .healthDataNotAvailable, .loading, .notDetermined:
        assertionFailure("This method shouldn't have been called for a state other than determined or partially determined!")
        return false
    case .determined(let healthTypes):
        return isAtLeastOneTypeIsAuthorised(determinedTypes: healthTypes)
    case .partiallyDetermined(let determinedTypes, _):
        return isAtLeastOneTypeIsAuthorised(determinedTypes: determinedTypes)
    }
}

private func isAtLeastOneTypeIsAuthorised(determinedTypes: [AppleHealthWriteDataTypes]) -> Bool {
    for determinedType in determinedTypes {
        if determinedType.isAuthorised {
            return true
        }
    }
    return false
}

makePartiallyDeterminedView(determinedTypes:, nonDeterminedTypes:)

As the last possible subview, this one shows the authorization status for the determined write types, and in a sub section, it shows the types that are yet to be determined, it uses two separate ForEach’s for both determinedTypes and nonDeterminedTypes, and a button to trigger the reauthorization flow for the new data types, and it looks like the following (Imagine we are introducing Height as a new type to write data in a future version):

Thats it! We have seen and discussed the views for all the possible states of a user in our health kit auhorisation management view. Although we omitted the implementation details for the views, with the APIs and the view model provided, you should be able to build a similar view for your apps in no time :)

Introducing a New Health Data Type at a Later Version

We have implemented the management view, and its view model, and also added for support for showing a section for new data types, that are yet to be determined in a subsection, along with a “Authorize Additional Access” button, but with the current set up, how can we introduce a new type, how easy it is?

Let’s have a look.

Imagine we want to introduce the Height as a new data type to write to HealthKit, to do that, we only need to update a few places in our code.

First, we need to update AppleHealthWriteDataTypes to include a new case for the height, and also edit its extensions of isDetermined and isAuthorised

enum AppleHealthWriteDataTypes: Hashable, Identifiable {
    ...
    case height(HKAuthorizationStatus)
}

extension AppleHealthWriteDataTypes {
    var isDetermined: Bool {
        ...
            .height(let authStatus):
            return authStatus != .notDetermined
        }
    }
}

extension AppleHealthWriteDataTypes {
    var isAuthorised: Bool {
        ...
            .height(let authStatus):
            return authStatus == .sharingAuthorized
        }
    }
}

Then we need to update AppleHealthUsedDataTypeProvider to include height in the write types set:

enum AppleHealthUsedDataTypeProvider {
    ...
    
    static func provideWriteTypes() -> Set? {
        guard HKHealthStore.isHealthDataAvailable(),
              let activeCaloriesBurned = HKObjectType.quantityType(forIdentifier: .activeEnergyBurned),
                let height = HKObjectType.quantityType(forIdentifier: .height) else {
            return nil
        }
        return [activeCaloriesBurned, .workoutType(), height]
    }
}

Then finally, in the getAuthorisationStatusForAppleHealthDataTypes method of AppleHealthIntegrationViewModel we include the new type as:

private func getAuthorisationStatusForWriteTypes() {
    guard let activeCaloriesBurned = HKObjectType.quantityType(forIdentifier: .activeEnergyBurned),
          let height = HKObjectType.quantityType(forIdentifier: .height) else {
        state = .notDetermined
        return
    }

    ...
    
    // Height Write Type
    let heightStatus = healthStore.authorizationStatus(for: height)
    let heightType: AppleHealthWriteDataTypes = .height(heightStatus)
    
    let appleHealthWriteTypes = [workoutType, activeCaloriesBurnedType, heightType]

    ...
}

Thats all! Now, if we run the app again for a user that has already authorized some write types, they will see the new type available same as the image shared for a partially determined view in the above section.

Final Words

In this piece, we have implemented a scalable HealthKit authorization state management logic, and also an accompanying view to inform users of their current state of HealthKit Integration, same as we did for Fit Records. We have made sure to show the authorization status for each write data type when determined, and added support for introducing new data types in the future, while also caring for the states where authorization is not determined, or denied.

I hope you found this article useful. Let me know what you think about it in the comments section. How are you managing Health Integration State? :)

Also if you are looking for a modern iOS App to track your workouts and exercises, give us a shot!

Until next time 👋

Building a Scalable Apple Health Authorization Management View for iOS was originally published in Fit Records on Medium, where people are continuing the conversation by highlighting and responding to this story.

What you didn’t know about URLSessionConfiguration’s waitsForConnectivity

Emre Havan — Fri, 17 Nov 2023 12:25:38 GMT

Configure your URLSessions the right way for waitsForConnectivity feature

Continue reading on Level Up Coding »

Use PassthroughSubject the Right Way in Your APIs With Combine

Emre Havan — Thu, 26 Oct 2023 13:42:21 GMT

There you are, trying to refactor the usage of that nasty notification center or implementing a new API where the consumers can observe…

Continue reading on Better Programming »

Writing a modern iOS Networking Library with Swift Concurrency

Emre Havan — Mon, 06 Mar 2023 09:09:34 GMT

Photo by Conny Schneider on Unsplash

In the ever-changing nature of software development, we often find ourselves in the need of rewriting some of our code and libraries. Getir is no exception, although it was working fine, we wanted to rewrite the networking library of our iOS Project.

Let’s take a look at the benefits of a library rewrite:

Support the modernization of the codebase
Provide easy-to-use APIs
Leverage newer system APIs for better performance and reliability
Transfer the ownership of code to the new developers

In addition to the benefits mentioned above, we also wanted to rewrite the Networking library in particular for the following reasons:

Codable support
Async-await support for the new Swift concurrency
Mock support for UI tests
Flexible API for different needs of different parts of the project
Increasing the unit test coverage of our Networking stack

In this article, we are going to talk about our adventure in this rewriting process, our approaches, the challenges faced, and the lessons learned along the way. Let’s get started! 🚀

Model Structure

In terms of the models used for the request and response types with the new networking, we wanted to use Decodable and Encodable, which provides us with easy-to-use decoding and encoding APIs.

Request Structure

We wanted to provide an easy way to describe the details of a request, such as an endpoint path, parameters, HTTP method, and any request-specific headers.

Thus, we’ve created the following protocol:

public protocol NetworkRequestable {
  var baseURL: String { get }
  var method: HTTPMethod { get }
  var path: String: { get }
  var parameters: Encodable? { get }
  var headers: [String: String]? { get }
}

Additionally, we provided the following extension, since not all requests need parameters or headers:

extension NetworkRequestable {
  public var parameters: Encodable? {
    nil
  }
  
  public var headers: [String: String]? {
    nil
  }
}

An example request struct would now look like this:

struct SampleRequest: NetworkRequestable {
  var method: HTTPMethod = .post 
  var baseURL = “https://my-base-url.com”
  var path = “my/login/path”
  var parameters: Encodable?
}

HTTPMethod is a basic enum describing the method to use for the request, later on, implementation details are omitted for brevity.

The model used for parameters is now a simple Encodable that looks like the following:

struct SampleRequestParameters: Encodable {
  let testProperty: String
  let secondTestProperty: String
}

Finally the initialization of the request:

let parameters = SampleRequestParameters(testProperty: “test”, secondTestProperty: “test2”)
let request = SampleRequest(parameters: parameters)

That’s it. The request is ready to be fired with the networking. But now, we have another problem, the base URL of a service doesn’t change often, but here we are providing it in the request, which means we are going to provide it for another request as well, it is repetitive and unnecessary. Also would be very hard to change all over the place if needed.

As a solution, we introduced a new protocol to define the baseURL, which acts as a middleman between requests and NetworkRequestable. Now all requests that need to connect to the same baseURL can conform to it.

protocol MyDomainSpecificRequest: NetworkRequestable {
  var baseURL = “https://my-base-url.com”
}

Now the SampleRequest can simply conform to MyDomainSpecificRequest and leave the baseURL out:

struct SampleRequest: MyDomainSpecificRequest {
  var method: HTTPMethod = .post
  var path = “my/login/path”
  var parameters: Encodable?
}

You might be wondering, why not just inject the baseURL in the Networking API? Because we wanted a single instance of a Networking to communicate with different services, and also wanted to keep it as lean as possible. Additionally, with this approach, we can keep our base URLs dynamic, outside of the Networking package. We could provide the baseURL conditionally in MyDomainSpecificRequest for example:

protocol MyDomainSpecificRequest: NetworkRequestable {

  var baseURL: String {
    switch ExampleState.environment {
    case .development: 
      return “https://my-base-url-development.com”
    case .production: 
      return “https://my-base-url.com”
    }
  }
}

Response Structure

When we make a network request, we often expect a response, but additionally, at Getir, we provide certain metadata that is available with every response, let’s have a look at the response structure

{
  "expectedResponse": { // Request specific response
    "testField": "testValue",
    "testFieldTwo": 2,
    ...
  },
  "metadata" : { // Metadata sent with every response
    "additionalField": "Test",
    ...
  }
}

So we needed to implement a way to provide the expected response and also the additional metadata attached to it. Previously this was done with subclassing, every response type would subclass the base response where the metadata properties were declared. But with the new implementation, we wanted to keep our response models as value types and implemented a struct with a generic associated response type to achieve this

public struct SuccessResponseWrapper: Decodable {
  public let metadata: ResponseMetadata 
  public let expectedResponse: T
}

So whenever a request succeeds the callers will receive a wrapper, which contains the expected response, and the metadata. The only requirement of the actual response type is that it should be a Decodable.

Constructing the Networking

Moving on from the request and response models, while constructing the Networking, we aimed to follow the single responsibility principle and enable effective testing by dividing the Networking into distinct layers.

Now that we talked about the request and response model structures, we can move forward with Networking implementation. While constructing the Networking, we wanted to create different layers, adhere to the single responsibility principle, and also quickly write tests for certain interactions.

To keep things concise, we will only discuss the four primary entities, even though the actual implementation has additional dependencies.

RequestFactory
RequestExecutor
RequestAdapters
ResponseParser

RequestFactory

The request factory is responsible for translating a NetworkRequestable into a URLRequest for the request execution. It has only one internal method and looks like the following:

func makeURLRequest(with request: T) throws -> URLRequest

It does the following in the given order:

Constructs the URL by combining the baseURL and the path
If there are parameters, encode them to Data with JSONEncoder
Depending on the HTTPMethod apply correct encoding (JSON or URL encoding)
Finally add the headers if they exist, and return the URLRequest

Encoding

For the JSON encoding, it was trivial, we could just encode the Encodable parameters as Data and set it as the httpBody of the URLRequest . But when it comes to URL encoding, things were a bit more tricky.

The query parameters were created by using JSONSerialization(Force unwrapping used for example purposes):

let queryParameters = try! JSONSerialization.jsonObject(
  with: parameters,
  options: .fragmentsAllowed
) as! [String: Any]

var urlComponents = URLComponents(url: url, resolvingAgainstBaseURL: false)!

urlComponents.queryItems = queryParameters.map {
  URLQueryItem(name: $0.key, value: String(describing: $0.value))
}

Although it looked fine at the first sight, soon we realized a problem with this approach, it was sending query values for booleans as 1 and 0, instead of true and false. It was happening because JSONSerialization converts Bool to CFBoolean, which acts as an Int when directly used with String(describing:).

To address the issue, we needed to check if the value was of type NSNumber when it is initialized with a boolean value. Additionally, we also needed to verify if the value could be converted to a Bool to ensure that we only convert actual boolean values and not mistakenly convert integer values to true or false.

Now the queryItems initialization looks like the following:

urlComponents.queryItems = queryParameters.map {
  if type(of: $0.value) == type(of: NSNumber(value: true)),
    let value = $0.value as? Bool {
      return URLQueryItem(name: $0.key, value: "\(value)"
  }
  return URLQueryItem(name: $0.key, value: String(describing: $0.value))
}

RequestExecutor

The request executor is responsible for executing the requests with the injected URLSession. It has one internal method:

func execute(_ request: URLRequest) async -> Result {
  do {
    let (data, response) = try await session.data(for: request)
    return .success(ExecutionSuccessModel(data: data, response: response))
  } catch let error {
    return .failure(error)
  }
}

If the execution succeeds, it provides a ExecutionSuccessModel which consists of Data and URLResponse , and if it throws an error, it is returned back to the Networking.

RequestAdapters

Request adapters allow for final modifications of requests before they are executed. They are injected into Networking and applied sequentially after the URLRequest is created by the request factory.

All the adapters must conform to RequestAdaptation protocol, which only has one requirement:

public protocol RequestAdaptation {
  func adapt(request: URLRequest) -> URLRequest
}

So an adapter can take a request, do something with it and provide it back.

Next we will take a look at an adapter that is used for providing the default headers for every ongoing request:

public final class DefaultHTTPHeaderAdapter: RequestAdaptation {
  private var headerProvidingClosure: () -> ([String: String])
  
  public init(headerProvidingClosure: @escaping () -> ([String: String]) {
    self.headerProvidingClosure = headerProvidingClosure
  }

  public func adapt(request: URLRequest) -> URLRequest {
    var mutableRequest = request
    let headers = headerProvidingClosure()
    headers.forEach {
      mutableRequest.setValue($0.value, forHTTPHeaderField: $0.key)
    }
    return mutableRequest
  }
}

The DefaultHTTPHeaderAdapter is initialized with a headerProvidingClosure, which is a closure that takes nothing and provides headers as key-value pairs when needed. This can, later on, be injected into the Networking so every ongoing request can have the default headers applied.

ResponseParser

The response parser converts the retrieved data into the expected type + metadata for a request call. It returns the previously mentioned success model or a network error in case of any issues.

Networking

Now that we talked about all the important pieces, we can construct the Networking. In addition to the entities we talked about, the Networking also has its own URLSession. We are going to inject all our entities in the init method so that we can write unit tests easily later on:

final public class Networking {
    private let requestMaker: RequestMaking
    private let requestExecutor: RequestExecuting
    private let requestAdapters: [RequestAdaptation]
    private let responseParser: ResponseParsing
    private let session: URLSession
    
    public init(requestMaker: /* all entities are injected */) {
        // properties are initialised
    }
}

Here we realized another problem, the consumers of the library don’t need to know about all these internal entities. But to make the init public so it can be initialized outside, and we can continue using dependency injection, we have to make all our internal entity protocols public. Any consumer can implement these protocols and provide their own implementations, which is not what we want.

Convenience init to the rescue
To solve this issue, we created a public convenience initialiser and kept the original one internal, resulting in two separate init methods:

public convenience init(requestAdapters: [RequestAdaptation] = []) {
    self.init(requestMaker: RequestFactory(),
              requestExecutor: RequestExecutor(),
              requestAdapters: requestAdapters,
              responseParser: ResponseParser())
}

init(requestMaker: RequestMaking, requestExecutor: RequestExecuting, requestAdapters: [RequestAdaptation], responseParser: ResponseParsing) {
    // properties initialised
}

We made only the RequestAdaptation public, keeping all other entities and protocols internal as planned. This allowed unit tests to use the internal init method with dependency injection. More details on this approach can be found here.

Implementing the request method

Networking implements the NetworkRequestProviding protocol and provides an async method to make network requests.

public func executeRequest(
  request: V,
  responseType: T.Type
) async -> Result, NetworkingError> {
  // some syntactic details are omitted for simplicity
  // make the URLRequest
  let urlRequest = requestFactory.makeURLRequest(with: request)
  // adapt request
  var adaptedRequest = urlRequest
  requestAdapters.forEach {
    adaptedRequest = $0.adapt(request: adaptedRequest)
  }
  // execute request
  let result = await requestExecutor.execute(adaptedRequest)
  // parse the result and return
  let parsedResult = responseParser.parseResult(result)
  return parsedResult
}

We could keep the method and Networking lean by delegating all the work to sub-entities as shown above.

Swift concurrency pitfalls

We tried returning the result in the main actor to prevent consumers from having to switch to the main actor to update their UI, similar to using a completion handler on the main queue in GCD.

We were initially skeptical about adding the @MainActor annotation to the method since it only ensures that the method itself runs on the main actor, not the caller. However, we were surprised to find that the callers of the function continued to run on the main actor within their Task block, so we didn’t need to specify those tasks to run on the main actor, or so we thought.

It turns out, before Swift 5.7, the behavior was non-deterministic, and after building our project with Xcode 14.0, we experienced local crashes due to UI updates on a background thread. We then removed the @MainActor annotation and updated networking interactions on the call site.

This change allowed consumers to switch to the main actor only when necessary, and to continue doing background work after retrieving the result from networking.

URLSession Invalidation

The Networking was functioning as intended at this stage, but we noticed the instance was still present in the memory after its intended scope. Something was wrong.

Although we initially found no apparent cause for a memory leak, we later realized that the system could cache the URLSession for future usage, retaining its delegate in the process. To prevent this behavior, we ensured proper deallocation by invalidating the URLSession in the deinit method.:

deinit {
  session.finishTasksAndInvalidate()
}

Usage

Now that we are done with the Networking implementation, let’s take a look at the complete usage, from request creation to result retrieval.


let networking: NetworkRequestProviding

func makeRequest() {
  Task {
    let parameters = SampleRequestParameters(testProperty: "test", secondTestProperty: "test2")
    let request = SampleRequest(parameters: parameters)

    let result = await networking.makeRequest(
        request: request,
        responseType: ExampleType.self // A decodable
    )

    switch result {
      case .success(let successWrapper):
        await updateUI(successWrapper.expectedResponse)
      case .failure(let error):
        await showError(error)
    }
  }
}

@MainActor
func updateUI(with model: ExampleType.self) {
  // update UI safely on main actor
}

@MainActor
func showError(_ error: NetworkingError) {
}

Swift concurrency makes requesting and processing results straightforward and readable, especially when compared to closure-based concurrency APIs.

Mock networking for UI tests

We also made a very cool Mock Networking feature with the ability to load mock JSON data, but that’s a story for another time — stay tuned for our next article! :)

Final words

We’re happy to have created a modern version of our Networking stack! We’ve built a highly scalable and easy-to-use library that’s been rigorously unit tested and leverages the new Swift concurrency. Along the way, we’ve learned a ton by overcoming various issues and challenges.

Thanks for reading this article — we hope you found it helpful! We’d love to hear your thoughts, so please join the discussion in the comments section. Until next time, take care! 👋

Writing a modern iOS Networking Library with Swift Concurrency was originally published in Getir on Medium, where people are continuing the conversation by highlighting and responding to this story.

Use Convenience Init To Avoid Making Entities Public in Your Package in Swift

Emre Havan — Tue, 10 Jan 2023 22:25:00 GMT

Create APIs that are more easily tested

Continue reading on Better Programming »

Implementing a Tracking System for iOS with CoreData

Emre Havan — Mon, 26 Oct 2020 14:31:02 GMT

An efficient implementation

Photo by mcmurryjulie on Pixabay

Originally published at https://freeletics.engineering on June 22, 2020.

As iOS developers, we often need to implement tracking in our applications. There are many third-party frameworks that would allow us to implement tracking systems in our projects. But in this article, we are going to talk about how we have implemented our custom tracking infrastructure at Freeletics with the help of CoreData, without using any third-party framework.

Our system will save each event generated by users, store them temporarily, and once the number of stored events reaches the defined limit, all the events are sent to the server. The client side tracking infrastructure is composed of three main entities: storage, batcher and sender.

TrackingEventStorage: Responsible for storing, fetching and deleting events using CoreData.
TrackingEventsBatcher: Responsible for batching events and acting as a layer of communication between TrackingEventStorage and TrackingEventSender.
TrackingEventSender: Responsible for sending a list of events to the server.

For the event itself, we have two different models. One is to store them in CoreData as NSManagedObject (ManagedInHouseTrackingEvent) and the other one is as a simple struct (InHouseTrackingEvent) to easily initialise from the consumer side and later to send to the backend. Our models look like the following:

ManagedInHouseTrackingEvent:

@objc(ManagedInHouseTrackingEvent)
public final class ManagedInHouseTrackingEvent: NSManagedObject {

}

extension ManagedInHouseTrackingEvent {

    @nonobjc public class func fetchRequest() -> NSFetchRequest {
        return NSFetchRequest(entityName: String(describing: ManagedInHouseTrackingEvent.self))
    }

    @NSManaged public var name: String?
    @NSManaged public var properties: Data?
    @NSManaged public var id: String?
}

extension  ManagedInHouseTrackingEvent {
    enum PropertyKey: String {
        case id
        case name
        case properties
    }
}

InHouseTrackingEvent:

struct InHouseTrackingEvent {
    let id: String
    let name: String
    let properties: [String: Any]
}

We normally do not need an id property for our events, but we will use it later while creating core data event models so that we can distinguish persisted events from each other later on.

As you can see, the properties field is of type Data in our managed model, whereas it is a [String: Any] dictionary in InHouseTrackingEvent. Since we are just going to use managed models to persist data rather than manipulating any existing ones, we are just going to convert properties to Data to easily persist them as Binary Data with CoreData.

Event Storage Implementation

After creating our models, and also xcdatamodel related to ManagedInHouseTrackingEvent, now we will continue with the core data stack.

We need to have a core data stack, where our storage class can initialise the managed context from its persistent container. Later we use this managed context to initialise NSEntityDescription that will describe our entity, and to interact with all CRUD operations for the database.

InHouseTrackingCoreDataStack:

final class InHouseTrackingCoreDataStack {

    static let shared = InHouseTrackingCoreDataStack()
    private let containerName = "FreeleticsInHouseTracking"

    private init() {}

    lazy var persistentContainer: NSPersistentContainer = {
        let container = NSPersistentContainer(name: containerName)
        container.loadPersistentStores(completionHandler: { [weak self] (_, error) in
            if let self = self,
                let error = error as NSError? {
                print("Error!")
            }
        })
        return container
    }()
}

TrackingEventStorage:

Now we can create TrackingEventStorage class. It will have four properties:

entityName: Representing the class name for our core data model.
coreDataStack: A reference to our core data stack.
managedContext: An NSManagedObjectContext which will be used to wrap all CRUD operations for core data.
eventEntity: An NSEntityDescription representing our core data model.

final class TrackingEventsStorage {

    let managedContext: NSManagedObjectContext
    let eventEntity: NSEntityDescription?

    private let entityName = "ManagedInHouseTrackingEvent"
    private let coreDataStack = InHouseTrackingCoreDataStack.shared

    init() {
        managedContext = coreDataStack.persistentContainer.newBackgroundContext()
        eventEntity = NSEntityDescription.entity(forEntityName: entityName,
                                                 in: managedContext)
    }
}

When we initialize the managedContext by using newBackgroundContext() from the persistent container, it will have the concurrencyType of privateQueueConcurrencyType. We want to have a dedicated managedContext so that whenever a database operation is done within, it will make sure every operation is executed on the same queue. We need this since CoreData is not thread-safe by default [1]. This will later allow us to safely interact with the tracking system regardless of what thread we are on. Moreover, we will be executing all core data related code inside a performAndWait [2] closure of the managedContext. This will make sure all our operations will be executed synchronously. We need synchronicity since many of our actions will be depending on each other, such as making sure to check stored events after storing a new event.

We are going to implement three public methods for this class to interact with.

func storeEvent(_ event: InHouseTrackingEvent)
func removeEvents(_ events: [InHouseTrackingEvent])
func storedEvents(withMaximumAmountOf limit: Int?) -> [InHouseTrackingEvent]?

But before that we need to implement some private helper methods the public methods will benefit from.

First, we will need to implement a method to execute a given fetch request, which will perform the given request and return its results.

private func performFetchRequest(_ request: NSFetchRequest) -> [NSManagedObject]? {
    var objects: [NSManagedObject]?

    managedContext.performAndWait {
        do {
            objects = try managedContext.fetch(request) as? [NSManagedObject]
        } catch {
            print("Error!")
        }
    }
    return objects
}

We also need a method to create a fetch request to perform, which will have two parameters:

identifiers: An optional array of identifiers to look for.
limit: An optional integer to set the limit of the fetch request.

private func makeFetchRequest(withIDs identifiers: [String]? = nil,
                              withMaximumAmountOf limit: Int? = nil) -> NSFetchRequest {
    let request = NSFetchRequest(entityName: entityName)
    if let identifiers = identifiers {
        request.predicate = NSPredicate(format: "id IN %@", identifiers)
    }
    if let limit = limit {
        request.fetchLimit = limit
    }
    return request
}

Next, we will implement the coreDataObjects method which will be retrieving stored NSManagedObjects with two parameters:

identifiers: An optional array of identifiers to look for.
limit: An integer to set the limit of the fetch request. and by calling both the makeFetchRequest and performFetchRequest methods.

private func coreDataObjects(withIDs identifiers: [String]? = nil,
                             withMaximumAmountOf limit: Int? = nil) -> [NSManagedObject]? {
    let request = makeFetchRequest(withIDs: identifiers,
                                   withMaximumAmountOf: limit)

    return performFetchRequest(request)
}

Another component we are going to need is a method to get InHouseTrackingEvent events from stored managed object events before providing those to upper-level APIs. We are going to create a factory class with makeEvent method for it as following:

final class InHouseTrackingEventFactory {

    typealias Keys = ManagedInHouseTrackingEvent.PropertyKey

    /// Initializes and returns an `InHouseTrackingEvent` from the given NSManagedObject
    /// - Returns: Returns an InHouseTrackingEvent from NSManagedObject or nil if any error occurs
    static func makeEvent(from object: NSManagedObject) -> InHouseTrackingEvent? {
        do {
            guard let propertiesData = object.value(forKey: Keys.properties.rawValue) as? Data,
                let properties = try JSONSerialization.jsonObject(with: propertiesData) as? [String: Any],
                let id = object.value(forKey: Keys.id.rawValue) as? String,
                let name = object.value(forKey: Keys.name.rawValue) as? String else {
                    return nil
            }
            return InHouseTrackingEvent(id: id,
                                        name: name,
                                        properties: properties)
        } catch {
            print("Error!")
        }
    }
}

Now we can add a method in TrackingEventStorage to convert all given managed object events into InHouseTrackingEvent:

private func events(from coreDataObjects: [NSManagedObject]) -> [InHouseTrackingEvent]? {
    var events = [InHouseTrackingEvent]()
    managedContext.performAndWait {
        for coreDataObject in coreDataObjects {
            if let event = InHouseTrackingEventFactory.makeEvent(from: coreDataObject) {
                events.append(event)
            }
        }
    }
    return events.isEmpty ? nil : events
}

Finally, we will implement a saveContext method to make sure any changes we made will be persisted in the database:

private func saveContext() {
    managedContext.performAndWait {
        do {
            guard managedContext.hasChanges else {
                return
            }
            try managedContext.save()
        } catch {
            print("Error!")
        }
    }
}

Now we are ready to implement our public methods mentioned before. These methods will allow other entities to interact with our core tracking mechanism.

Let's add a typealias to TrackingEventStorage class that we will use for our managed models property keys:

typealias Keys = ManagedInHouseTrackingEvent.PropertyKey

The first public method we are going to implement is storeEvent, which will persist given InHouseTrackingEvent as an NSManagedObject.

func storeEvent(_ event: InHouseTrackingEvent) {
    guard let eventEntity = eventEntity else {
        return
    }

    managedContext.performAndWait {
        let managedEvent = NSManagedObject(entity: eventEntity, insertInto: managedContext)
        managedEvent.setValue(event.id, forKey: Keys.id.rawValue)
        managedEvent.setValue(event.name, forKey: Keys.name.rawValue)

        do {
            let propertyData = try JSONSerialization.data(withJSONObject: event.properties)
            managedEvent.setValue(propertyData, forKey: Keys.properties.rawValue)
        } catch {
            print("Error!")
            return
        }
    }

    saveContext()

}

Second one is removeEvents which accepts an array of InHouseTrackingEvent and removes corresponding managed model for each event in the array.

func removeEvents(_ events: [InHouseTrackingEvent]) {
    let eventIDs = events.map { $0.id }

    guard let coreDataObjects = coreDataObjects(withIDs: eventIDs) else {
        return
    }

    managedContext.performAndWait {
        coreDataObjects.forEach { self.managedContext.delete($0) }
    }

    saveContext()
}

Last public method is storedEvents which accepts limit parameter to return stored managed models with the maximum amount of limit.

func storedEvents(withMaximumAmountOf limit: Int?) -> [InHouseTrackingEvent]? {
    guard let objects = coreDataObjects(withMaximumAmountOf: limit) else {
        return nil
    }

    return events(from: objects)
}

Event Sender Implementation

We are going to omit implementation details for the event-sending class for simplicity. InHouseTrackingEventSender is going to have a method to send events that will accept an array of InHouseTrackingEvent and make an URL request to send them to the backend. Moreover, it is going to have a weak delegate property of type TrackingEventSenderDelegate which will be needed to notify once events have successfully submitted to the backend. As you probably noticed, errors are not handled explicitly. If something goes wrong, we simply do nothing and send the same events later on.

InHouseTrackingEventSender:

final class InHouseTrackingEventSender {

    weak var delegate: TrackingEventSenderDelegate?

    func sendEvents(_ events: [InHouseTrackingEvent]) {
        // Make sure there are no ongoing requests and make a
        // post request to the backend by including each event in
        // the body of the request.

        // success:
        delegate?.didSendEvents(events)

        // error:
        // Handle error
    }
}

TrackingEventSenderDelegate:

protocol TrackingEventSenderDelegate: class {
  func didSendEvents(_ events: [InHouseTrackingEvent])
}

Event Batcher Implementation

It is time for us to implement the last part of our tracking service. We need a batching mechanism to make sure our tracking system will work by taking performance, battery, and real-time tracking into account. By providing a batch size, we will try to have an ideal balance between performance and real-time tracking by not triggering a URL request for each event stored, but only triggering once the stored event number meets the batch size. It is going to be a singleton and going to be used to directly track an event. Before implementing the batcher singleton, let's write a simple struct which will be responsible for providing the batch size. We could hardcode this value but providing it via another entity can make it easier and clearer to maintain this information, especially if it can be updated via remote configurations.

struct TrackingEventsBatchSizeProvider {
    let defaultBatchSize = 20
}

extension TrackingEventsBatchSizeProvider: TrackingEventsBatchSizeProviding {
    var batchSize: Int {
        // We just return default size for simplicity but we could get some remote config value
        // at this point and provide it as well.
        return defaultBatchSize
    }
}

TrackingEventsBatcher:

Now we can create the batcher singleton, TrackingEventsBatcher.

It will be initialised with four properties:

shouldBatchEvents: A boolean to indicate if events should be batched or sent immediately.
eventStorage: An instance of TrackingEventStorage.
eventSender: An instance of InHouseTrackingEventSender.
batchSizeProvider: A struct to provide how big the batch size should be.

It will also conform to TrackingEventSenderDelegate to set itself as the delegate of the initialised event sender class.

final class TrackingEventsBatcher: TrackingEventSenderDelegate {

    static let shared = TrackingEventsBatcher()

    var shouldBatchEvents = true

    private var eventStorage: TrackingEventStoring
    private var eventSender: TrackingEventSending
    private var batchSizeProvider: TrackingEventsBatchSizeProviding

    init(eventStorage: TrackingEventStoring = TrackingEventsStorage(),
         eventSender: TrackingEventSending = InHouseTrackingEventSender(),
         batchSizeProvider: TrackingEventsBatchSizeProviding = TrackingEventsBatchSizeProvider()) {
        self.eventStorage = eventStorage
        self.eventSender = eventSender
        self.batchSizeProvider = batchSizeProvider
        self.eventSender.delegate = self
    }

    func didSendEvents(_ events: [InHouseTrackingEvent]) {
        // empty for now
    }
}

As you can see, shouldBatchEvents is a public property so that it can later be modified. Control with this flag will allow us to either submit tracked events immediately or batch them until we hit the batch size. For the simplicity of this article, it will always be true.

Now we will add 2 helper private methods, the first one is to determine if the events should be sent, and another one to send events if needed:

private func shouldSendEvents(_ events: [InHouseTrackingEvent]) -> Bool {
    // Send events if they shouldn't be batched, regardless of their number
    // or only if their number is greater than the batch size, if they should be batched.
    return !shouldBatchEvents || events.count >= batchSizeProvider.batchSize
}

private func sendEventsIfNeeded() {
    guard let storedEvents = eventStorage.storedEvents(withMaximumAmountOf: batchSizeProvider.batchSize),
        shouldSendEvents(storedEvents) else {
            return
    }
    eventSender.sendEvents(storedEvents)
}

We first fetch stored events with batchSize limit and then see if we should be sending events already.

Now we will implement the method which will be the entry point of our whole tracking infrastructure, the following method will be called throughout the application where an entity needs to track an event.

func batchEvent(_ event: InHouseTrackingEvent) {
    eventStorage.storeEvent(event)
    sendEventsIfNeeded()
}

Whenever an event is tracked through batchEvent, we will store it and check if events should be sent.

Finally, we will update didSendEvents method as following:

func didSendEvents(_ events: [InHouseTrackingEvent]) {
    eventStorage.removeEvents(events)

    sendEventsIfNeeded()
}

We make sure all submitted events are removed from storage and check if more events should be sent. This logic is needed because the number of stored events might have been more than twice the batch size. This can occur when the app is used offline and no events have been sent for a while.

Usage

Let's see how we can interact with the system with a sample class:

class SampleEntity {
    func trackSomething() {
        let eventName = "example_event"
        let id = "\(eventName)_\(Date().timeIntervalSince1970)"
        let properties: [String: Any] = [
            "propertyOne": "1",
            "propertyTwo": true
        ]
        let event = InHouseTrackingEvent(id: id,
                                         name: eventName,
                                         properties: properties)
        TrackingEventsBatcher.shared.batchEvent(event)
    }
}

Usually, we have different entities for different events in our applications and this manual conversion of properties can be prevented by providing a mechanism to convert properties into required dictionary format through event entities. But for simplicity, we just add two random properties and show how it can be batched here. We could also implement a wrapper function called track, which could internally handle batching as well.

Further improvements for TrackingEventStorage

There are a few more things we need to consider for the TrackingEventStorage. Especially for the saveContext() method. There is a property named isProtectedDataAvailable which lives inside UIApplication. This property will help us to determine if there is data protection active or the device is locked. For such cases we should not attempt to do database operations, otherwise, we might experience some crashes [3].

Let’s add the check for this property as we check if there are any changes as well (in saveContext):

guard UIApplication.shared.isProtectedDataAvailable,
    managedContext.hasChanges else {
    return
}

One could expect this to work right away but now we have another problem. We have implemented our tracking mechanism as thread-safe but we should only be checking UIApplication.shared.isProtectedDataAvailable from the main queue. Thus, we need to check on which queue we are in before attempting to read this value and synchronise with the main if necessary. We could just do if Thread.isMainThread check, but we are going to go with a different solution instead since this check might not just be enough and safe to make sure we can synchronise with the main queue [4].

We are going to use a refactored version of this post to determine which dispatch queue we are running on properly.

DispatchQueue extension:

import Foundation

// Reference https://stackoverflow.com/a/60314121/8447312
public extension DispatchQueue {

    static var current: DispatchQueue? { getSpecific(key: key)?.queue }

    private struct QueueReference {
        weak var queue: DispatchQueue?
    }

    private static let key: DispatchSpecificKey = {
        let key = DispatchSpecificKey()
        setUpSystemQueuesDetection(key: key)
        return key
    }()

    private static func setUpSystemQueuesDetection(key: DispatchSpecificKey) {
        let queues: [DispatchQueue] = [
            .main,
            .global(qos: .background),
            .global(qos: .default),
            .global(qos: .unspecified),
            .global(qos: .userInitiated),
            .global(qos: .userInteractive),
            .global(qos: .utility)
        ]
        registerDetection(of: queues, key: key)
    }

    private static func registerDetection(of queues: [DispatchQueue], key: DispatchSpecificKey) {
        queues.forEach {
            $0.setSpecific(key: key,
                           value: QueueReference(queue: $0))
        }
    }
}

Now we will add a new method to TrackingEventsStorage to check if isProtectedDataAvailable properly:

private func isProtectedDataAvailable() -> Bool {
    var isProtectedDataAvailable = false

    if DispatchQueue.current == DispatchQueue.main {
        isProtectedDataAvailable = UIApplication.shared.isProtectedDataAvailable
    } else {
        DispatchQueue.main.sync {
            isProtectedDataAvailable = UIApplication.shared.isProtectedDataAvailable
        }
    }
    return isProtectedDataAvailable
}

Let’s change the saveContext method to the following, in order to make use UIApplication.shared.isProtectedDataAvailable is true before saving the context.

private func saveContext() {
    let protectedDataAvailable = isProtectedDataAvailable()
    managedContext.performAndWait {
        do {
            guard protectedDataAvailable,
                managedContext.hasChanges else {
                    return
            }
            try managedContext.save()
        } catch {
            print("Error!")
        }
    }
}

One thing to note is that we are doing queue changing, if necessary, outside of the performAndWait closure. It is needed since perform and performAndWait closures should only be used for changes related to NSManagedObjects.

Conclusion

In this article, we have seen how CoreData can be used for a custom event tracking system implementation. We have built a system to persist events temporarily on the device and submit them to the backend in batches. We have also made sure that such a system can be accessed from different threads/queues and explored ways of properly determining the current queue of the execution.

Use Enums and Associated Values to Parse JSON in Swift

Emre Havan — Mon, 03 Feb 2020 20:37:55 GMT

Parse items with different key-value pairs

Continue reading on Better Programming »

How to deploy a Review Classifier in ANY application

Emre Havan — Mon, 20 Jan 2020 07:12:01 GMT

You can classify your dataset but how to make it useful for future applications?

Continue reading on TDS Archive »

Recommender System Application Development

Emre Havan — Sat, 07 Dec 2019 00:01:31 GMT

Cosine Similarity, Rating thresholding and other custom techniques

Continue reading on TDS Archive »

Recommender System (Öneri Sistemi) Uygulaması Geliştirme Bölüm 1/4: Cosine Similarity

Emre Havan — Thu, 05 Dec 2019 23:18:38 GMT

Cosine Similarity, Rating eşiği ve matematiksel formüllerle Recommender System (Öneri Sistemi) Uygulaması Geliştirme

Kaynak

Merhabalar,

Bu yazıda, Python, Cosine Similarity ve diğer matematiksel fonksiyonlar kullanarak öneri uygulaması (Recommender System, Recommender Engine) geliştireceğiz. Bu uygulama yüksek lisans bitirme projemde geliştirdiğim uygulamanın bir kısmını içeriyor olacak. Eğer dilerseniz bitirme projemin kaynak koduna buradan erişebilirsiniz.

Birçok farklı yol izleyerek öneri uygulaması geliştirmek mümkün ama bizim burada inceleyeceğimiz yöntemler Cold Start problemine çözüm öneren bir yaklaşım olacak. Cold Start problemi, kullanıcı hakkında hiçbir bilgiye sahip değilken (yeni kaydolmuş bir kullanıcı) öneri yapma sorunudur. Bu projemizde kullanıcıdan çok az bilgi alarak (örneğin kullanıcının yalnızca bir kategoriyi seçmesi) elde var olan datadan mantıklı öneriler yapmaya çalışacağız.

Gerekenler:

Python 3
numpy
pandas
nltk

Bu yazımızda Python programlama dilini bildiğinizi varsayarak ilerleyeceğim, ayrıca bu serideki asıl amaç öneri uygulaması ve farklı formüller ve tekniklerle öneri yapan bir sistem oluşturmak olduğu için, yazılan Python kodunun detaylarına fazla girmeyeceğim. Ayrıca Python dilinde çok uzman olmadığım için yazdığım bazı kısımlar muhtemelen daha iyi yazılabilirdi. Eğer değiştirmemi istediğiniz bir kısım olursa yorum yazarak beni haberdar ederseniz memnun olurum :)

4 farklı versiyon geliştirerek her birinde öneri sistemimizi farklı açıdan geliştirip daha kapmsalı bir hale getireceğiz. İlk versiyonumuzda Cosine Similarity kullanarak öneri yapacağız.

Yazacağımız öneri uygulaması, verilen gezi türüne göre 5 farklı şehir öneren bir sistem olacak. Tabi bu uygulama için izleyeceğimiz yöntemleri uygulayarak başka türde öneriler yapan sistemler geliştirmekte mümkün.

Sistemin baz alacağı veri setini buradan indirebilirsiniz.

Elimizdeki veri setindeki bazı özellikler (feature) gerçek verilerken bazıları ise deneme amaçlı benim rastgele eklediğim verilerden oluşuyor. Genel olarak veri setimiz 25 sehirden olusan bir veri seti. Setimizdeki her veri şu featurlara sahip: city, popularity, description, image, rating, rating_count, positive_review, negative_review. Veri setimizdeki özellikler ve değerleri ilk 5 şehir için aşağıda ki resimde gösterilmektedir.

Şehir veri seti özeti

Yukarda belirttiğim featurelardan city, popularity, description ve image bilgilerini Tripadvisor sitesindeki şehir bilgilerini yansıtıyor (tabi zamanla bilgiler siteden farklılık gösterecektir). Bunlar haricinde olan rating, rating_count, positive_review ve negative_review featureları ise rastgele değerler içeriyor. Dilerseniz her bir feature’ın ne işe yaradığını tek tek inceleyelim.

city: Şehir ismi
popularity: Şehir için kaydedilmiş review sayısı
description: Şehir hakkında bilgi veren küçük blog yazısı
image: Şehir background image url
rating: Şehrin sahip olduğu ortalama rating değeri 0–10 skalasında
rating_count: Kaç kullanıcıdan rating alındığı
postive_review: Girilen pozitif inceleme sayısı
negative_review: Girilen negatif inceleme sayısı

Veri setimizi ve sahip olduğumuz featureları öğrendiğimize göre artık başlayabiliriz. İlk versiyonumuzda yalnızca, city ve description featurelarini kullanarak ilerleyeceğiz.

Versiyon-1

İlk geliştireceğimiz versiyon yalnızca veri setindeki girdilerin description (açıklama) feature’ını baz alarak öneriler verecek bir sistem olacak. Bu sistem kullanıcının seçtiği gezi türüne bağlı olan keywordler (anahtar kelime) ile şehir açıklamaları arasındaki cosine similarity (cosinus benzerliği) değerini hesaplayarak en yüksek değere sahip olan 5 şehri kullanıcılara önerecek.

Cosine Similarity

Yukarda belirtilen benzerliği Cosine Similarity kullanarak hesaplayacağız. Cosine Similarity iki vektör arasındaki cosinus açısını çok boyutlu bir uzayda hesaplayarak vektörler arasındaki benzerliği ölçme metodudur. Cosine Similarity iki vektörün dot productunun, vektörlerin büyüklüklerinin çarpımına bölünmesiyle elde edilir. Aşağıda Cosine Similarity formülünü ve iki vektör arasındaki açıyı görsel olarak ifade eden figür’ü görebilirsiniz. 2 vektör arasındaki açı ne kadar küçükse, bu iki vektör birbirine o derecede benzerdir diyebiliriz. Bu method’a projemizin ilerleyen kısımlarında tekrar değineceğiz ama daha detaylı bilgi almak için [1], [2] referanslarına bakabilirsiniz.

A ve B vektörleri için Cosine Similarity Formülü

3 boyutlu uzayda A ve B vektörleri arasındaki benzerlik

Preprocessing (Ön işlem)

Öncelikle elimizde olan veri setini kullanıma hazır bir hale getirebilmek için bazı işlemler gerçekleştirmemiz gerekiyor. Veri setimizinde içinde bulunduğu bir klasörde pre_processing.py adında bir Python dosyası oluşturalım.

Öncelikli olarak geliştireceğimiz recommender system sadece description feature’ını baz alarak önerilerde bulunacağı için veri setimizdeki girdilerin description featurelar’ını biraz temizlememiz gerekiyor.

İlk olarak ingilizcede stop words olarak adlandırılan kelimeleri şehir betimlemelerinden sileceğiz. Stop words, her yazıda sıkça rastlanan ama kendi başına pek bir anlam ifade etmeyen kelimelere denmektedir. (the, for, an, a, or, what vb). Bunu yapmamızdaki amaç, gereksiz olan kelimelerin çok fazla geçtiği şehir betimlemelerindeki benzerlik skorunun düşmesini engellemek. Çünkü cosine similarity hesaplamasında betimlemelerdeki her bir kelime uzayda ayrı bir boyut oluşturacağı için, bu tarz gereksiz kelimeler benzerlik skorunu negatif yönde etkileyebilir.

import numpy as np
import pandas as pd
from nltk.corpus import stopwords

def clear(city):
    city = city.lower()
    city = city.split()
    city_keywords = [word for word in city if word not in stopwords.words('english')]

    merged_city = " ".join(city_keywords)
    return merged_city

Yukardaki clear methodu sayesinde veri setimizdeki şehir betimlemelerini temizleyebiliriz. Method sırasıyla şu şekilde çalışıyor:

city adında bir String parametresi kabul ediyor
Alınan Stringde ki tüm harfleri öncelikle .lower() methodu yardımıyla küçültüyor
Sonrasında .split() ile tüm kelimeleri ayırarak bir String listesi oluşturuyor
Sonrasında ise stopwords içersinde olan tüm kelimeleri bu listeden çıkararak city_keywords değişkenini elde ediyoruz.
Daha sonra tüm bu kelimeleri aralarında boşluk bırakarak bir String haline getirip (merged_city) methodun çağrıldığı yere dönüyor.

Şimdi bu metodu veri setimizdeki her bir şehre uygulayarak betimlemeleri temizleyelim. Aşağıdaki kod bloğunu clear metodunun altına ekleyelim:

for index, row in df.iterrows():
    clear_desc = clear(row['description'])
    df.at[index, 'description'] = clear_desc

updated_dataset = df.to_csv('city_data_cleared.csv')

Bu kod bloğu her bir şehir betimlemesini temizleyerek, temizlenmiş verileri city_data_cleared.csv adında bir dosyaya kaydedecek. Bundan sonraki işlemlerimizde o veri setini kullanacağız.

pre_processing.py gist:

https://medium.com/media/a0503ffe039790bbd29611fd04587551/href

Cosine Similarity ile benzerlik hesaplama

Artık şehir betimlemelerini temizlediğimize gore, benzerlik skorunu hesaplayacak olan metodu yazmaya baslayabiliriz. cosine_similarity.py adında bir python dosyasi olusturalim.

Daha onceden belirttiğim gibi, yazacağımız metod iki ayrı string deki kelimelerin benzerliğine gore skor veren bir metod olacak. İlk olarak verilen bu iki ayrı String kelimelerden olusan vektörlere cevireceğiz. Sonrasında herhangi bir vektörde var olan herhangi bir kelime uzayda ayrı bir boyut oluşturacak, ve eğer bir vektörde var olan bir kelime diğer vektörde yoksa, o kelime icin öteki vektörün uzayındaki kelimenin boyutuna karşılık gelen deger 0 olacak.

Not: Cosine similarity, kelimelerin bir metinde birden fazla gecmesinden çok etkilenmeyen bir metod, bizim açımızdan kelimelerin bir kere geciyor olması yeterli olacağı icin bunun pek bir onemi yok, ama eğer ilerde siz başka bir uygulamada buna önem vermek isterseniz, Pearson correleation metoduna bakmanızı öneririm.

Daha generic ve tekrar kullanilabilir olması icin cosine_similarity.py dosyamıza yazacağımız kodlari CosineSimilarity adında bir sınıf altında yazalım:

import re, math
from collections import Counter

class CosineSimilarity:
    def __init__(self):
        print("Cosine Similarity initialized")
    
    @staticmethod
    def cosine_similarity_of(text1, text2):
        first = re.compile(r"[\w']+").findall(text1)
        second = re.compile(r"[\w']+").findall(text2)
        vector1 = Counter(first)
        vector2 = Counter(second)

        common = set(vector1.keys()).intersection(set(vector2.keys()))

        dot_product = 0.0

        for i in common:
          
            dot_product += vector1[i] * vector2[i]

        squared_sum_vector1 = 0.0
        squared_sum_vector2 = 0.0

        for i in vector1.keys():
            squared_sum_vector1 += vector1[i]**2

        for i in vector2.keys():
            squared_sum_vector2 += vector2[i]**2

        magnitude = math.sqrt(squared_sum_vector1) * math.sqrt(squared_sum_vector2)

        if not magnitude:
           return 0.0
        else:
           return float(dot_product) / magnitude

Yazdığımız cosine_similarity_of metodu aşağıdaki sekilde çalışıyor:

Öncelikle verilen iki Stringi Regex yardımıyla kelimelerine ayırıyor
Daha sonra iki string icinde iki ayrı, kelimeleri ve kaç kere geçtiklerini içeren bir dictionary oluşturuyor (örn: nice: 4)
Daha sonra her iki vektörde de ortak bulunan kelimeleri elde ediyor
Yukarda Cosine Similarity başlığı altında verilen formulu izleyerek benzerliği hesaplayıp bunu metodun sonunda donuyor

cosine_similarity.py gist:

https://medium.com/media/6eb3030775af8767891c82194e278532/href

Öneri Motoru yazimi

Artık sehir betimlemelerimizi temizleyip cosine similarity hesaplayan metodumuzu yazdığımıza gore, birinci versiyon icin öneri yapacak olan motorumuzu yazmaya baslayabiliriz.

İlk versiyonumuzda sadece benzerlik skoruna gore öneri yapacağımız icin motor sınıfımız küçük olacak, ama sonraki versiyonlarda ayni kodları geliştireceğimiz icin ayrı bir sınıf olarak baslamakta fayda var.

recommender_engine.py:

https://medium.com/media/3fe7ad1f1583de70c4580bdc64824044/href

get_recommendations(keywords) metodu sırasıyla aşağıdaki gibi çalışıyor:

Öncelikle, sehir betimlemelerinin benzerliğini hesaplayabileceği keywords adında String parametresi alıyor
Her sehir icin verilen keywords ile olan benzerlik skorunu hesaplıyor ve bunları sehir indexi — skor seklinde bir dictionary de tutuyor.
Sehirlerin; city, popularity, description ve score featurelarini içeren boş bir data frame oluşturuyor.
En yüksek skora sahip 5 sehri bu data frame ekliyor
Son olarak, bu data frame’i json a çevirip donuyor.

Request kodu

Öneri yapan ve cosine similarity hesaplayan sınıflarımız olduguna göre artık bunları test etmenin zamanı geldi. request.py adında bir python dosyası oluşturalım.

Öneri uygulamamızı 3 ayrı kategori altında deneyeceğiz. Bunlar;

Culture, Art and History (Kültür, Sanat ve Tarih)
Beach and Sun (Kumsal ve Güneş)
Nightlife and Party (Gece hayati ve parti)

Ben veri setimizdeki sehir betimlemelerini inceleyerek, her üç kategori icin keywordleri sırasıyla aşağıdaki gibi belirledim:

[history historical art architecture city culture]
[beach beaches park nature holiday sea seaside sand sunshine sun sunny]
[nightclub nightclubs nightlife bar bars pub pubs party beer]

3 ayrı kategori icin request gönderecek 3 ayrı metodu request.py dosyamıza aşagıdaki gibi yazalım:

from recommender_engine import RecommenderEngine

culture_keywords = "history historical art architecture city culture"
beach_n_sun_keywords = "beach beaches park nature holiday sea seaside sand sunshine sun sunny"
nightlife_keywords = "nightclub nightclubs nightlife bar bars pub pubs party beer"

def get_recommendations(keywords):
    result = RecommenderEngine.get_recommendations(keywords)
    return result

def get_top_5_city_names_out_of_json(json_string):
    list = json.loads(json_string)
    result = []
    max = len(list)
    i = 0
    while i < max:
        result.append(list[i]['city'])
        i += 1

    return result

top_5_cultural_cities = get_recommendations(culture_keywords)
city_names_for_cultural = get_top_5_city_names_out_of_json(top_5_cultural_cities)
print(city_names_for_cultural)
print("#################")

top_5_summer_cities = get_recommendations(beach_n_sun_keywords)
city_names_for_summer = get_top_5_city_names_out_of_json(top_5_summer_cities)
print(city_names_for_summer)
print("#################")

top_5_party_cities = get_recommendations(nightlife_keywords)
city_names_for_party = get_top_5_city_names_out_of_json(top_5_party_cities)
print(city_names_for_party)
print("#################")

get_recommendations metodu gönderilen keywordlere gore gelen öneri json stringini donerken, get_top_5_city_names_out_of_json metodu ise donen önerilerden sehir isimlerini ve skorlarını ayrıştırıp geri donuyor. (ikinci metodun tek amacı print ettigimiz zaman sadece sehir isimlerini ve skoru görebilmek, çünkü hatırlarsanız recommender_engine her sehir icin birden farklı feature özelligi donuyor ve hepsini print ettirmek su an gereksiz.)

request.py gist:

https://medium.com/media/85604a59800970da625857c8756c71c9/href

Kodu çalıştırdığımızda 3 ayrı kategori icin önerileri ve skorlarını alacağız fakat, aşağıda sadece Kültür ve Sanat kategorisi icin olan sonuclar gösterilmektedir:

[('Athens', 0.21629522817435007),
 ('St. Petersburg', 0.16666666666666666),
 ('Stockholm', 0.14962640041614492),
 ('Milan', 0.140028008402801),
 ('Rome', 0.12171612389003691)]

Atina sehri icin benzerlik skoru %21,6 iken, Roma sehri icin gelen benzerlik skoru %12,2 civarinda. Benzerlik skoru beklediğinizden daha düşük gelmiş olabilir, bunun nedeni sehir betimlemelerinde doğal olarak bizim manuel olarak girdiğimiz keywordlerden farklı kelimelerin var olması. Farklı kelimeler uzayda farklı boyutların oluşmasına neden oluyor ve keywordlerimizde bu boyutlara karşılık gelen değerlerin olmaması, sonuçları düşürüyor. Eğer keywords listesine farklı kelimeler eklerseniz veya bazı kelimeleri silerseniz, sonuçların değişeceğini görebilirsiniz.

Sonuç

Bu versiyonda üç farklı kategori icin, seçilen kategorideki keywordler ile sehir betimlemeleri arasındaki cosine similarity skorunu hesaplayarak gezilecek sehir önerisi yapan bir öneri uygulaması geliştirdik.

Her ne kadar genel olarak benzerlik skorları düşük olsada, verilen Kültür ve Sanat kategorisi icin önerilen top 5 sehir incelendiğinde, yazdiğimiz sistemin verilen kategoriye uygun sehirler döndüğünü görebiliyoruz. Diğer kategoriler incelendiğinde gelen önerilerin verilen kategori icin mantıklı ve uygun sehirler oldugunu görülebilir. Sonuçları teyid etmek icin sehir betimlemelerini okuyabilirsiniz :)

İlk versiyonumuzun sonuna geldik. Bu versiyon için geliştirdiğimiz kodlara şuradan ulaşabilirsiniz.

Bir sonraki versiyonda cosine similarity ile birlikte farklı bir formül uygulayarak, Rating bilgisini hesaba katarak nasıl öneri yapılabileceğini inceleyeceğiz.

Versiyon-2 (Rating Katkısı)

Bu versiyonda veri setimizdeki rating feature’ını da kullanarak, öneri sistemimizi daha dinamik ve iyi bir hale getirmeye çalışacağız. Kötü rating’e sahip içerikleri önermek istemeyiz değil mi? :)

CS ve rating katkısıyla skor hesaplama

İlk olarak kaç adet rating verildiğini önemsemeyeceğiz. Bir önceki versiyonda olduğu gibi gene CS skorunu hesaplayacağız ama bu sefer ek olarak son skor hesaplamasında rating bilgisinide hesaba katacağız. Öncelikle, rating katkısını belirleyecek olan bir method yazacağız. Bu metodumuz iki parametre alacak, Q ve r. r parametresi şehir rating’i, Q parametresi ise rating katkısının son skora ne kadar etki edeceğini belirleyen bir değer olacak. Q parametresini artırıp azaltarak, rating değerinin, CS skoruna kıyasla son skora ne kadar katkı sağlayacağını belirleyeceğiz.

Yeni metodumuz son skoru, CS skoruna pozitif veya negatif katkı sağlayarak hesaplayacak. Rating katkısı eğer şehir rating’i 5'in üzerindeyse pozitif (5 veya üzerinde ratinge sahip olan şehirlerin beğenildiği varsayılıyor), altında ise negatif olacak (5 altındaki şehirlerin beğenilmediği ). Rating değeri 0 ila 10 arasında değişirken, rating katkı output’u ise -Q ve +Q arasında bir değer olacak.

Örneğin Q=10 olarak verilirse, son skor, en yüksek rating için (10): CS skoru + CS skorunun % 10'u olarak hesaplanırken, en düşük rating için (0): CS skoru — CS skorunun %10' olarak hesaplanacak.

Metodda kullanılacak olan formül, aşağıdaki grafikte, verilen rating değerinin mavi eğride karşılık geldiği noktayı bularak bunu katkı değeri olarak dönecek. Aşağıda Q=10 için metodumuzun ne tür katkı değeri sağlayacağı görsel olarak gösterilmiştir:

Rating katkısı hesaplayan metod (Q=10)

Şimdi rating_extractor.py adında bir dosya oluşturalım ve aşağıdaki kodu ekleyelim:

class RatingExtractor:
    def __init__(self):
        print("initialized")

    #Returns value between -q and q. for rating input between 0 and 10.
    #Parameters:
        #rating: indicates the rating for the destination
        #q: indicates the percentage of rating for general score. (default is 10.)
    @staticmethod
    def get_rating_weight(rating, q=10):
        if rating > 10 or rating < 0:
            return None
        else:
            m = (2*q) / 10 #10 because rating varies between 0 and 10
            b = -q
            return (m*rating) + b

Metodlardaki yorumlar ingilizce olduğu için kusura bakmayın lütfen, ama zaten her metodu detaylı olarak anlatmaya çalışıyor olacağım.

get_rating_weight() metodu verilen rating ve Q parametrelerine göre hesaplamalar yaparak rating katkısını hesaplayıp geri dönen bir metod. Daha öncede belirttiğim gibi, bu metod hem pozitif hem negatif değerler dönebilir. Döndüğü değere göre, son skor’a ya pozitif yada negatif bir katkı sağlıyor olacak. (Q parametresinin varsayılan değeri 10 olarak ayarlanmıştır.)

Recommender Engine için yeni metod geliştirme

Simdi Recommender Engine sinifimiza, cosine similarity skorunu ve rating katkisini kullanarak genel skor hesaplayacak bir method yazacagiz. Asagidaki metodu RecommenderEngine sinifina ekleyelim:

def calculate_final_score(cs, r):
    amount = (cs / 100) * r

    return cs + amount

Method asagidaki gibi calisiyor:

CS skoru ve rating katkisi r parametrelerini aliyor
CS skorunun % +- r lik kismini amount olarak hesapliyor
Hesaplanan amount’u CS skoruna ekleyerek dönüyor.

Amount pozitif veya negatif bir deger olacagi icin, son skorumuz rating katkisina bagli olarak ya CS skorunu artiracak yada azaltacak.

Bu yaklaşım rating bilgisini kullanmamız için faydalı olacak fakat şu durumu belirtmekte fayda var. Rating katkısına dayalı olarak son skor hesaplaması, CS skoruna bağlı bir şekilde gerçekleşiyor. CS skorunun belirli bir yüzdesi üzerinden son skor hesaplandığı için; özellikle get_rating_weight() metodu icin yüksek Q değerleri girildiğinde, CS skoru (benzerliği) yüksek olan şehirler, düşük olan şehirlere göre daha fazla etkilenecekler.

Şimdi RecommenderEngine sınıfına yeni metodlarımızı kullanarak öneri yapması için yeni bir metod yazalım. (Birinci versiyonda yazdığımız metodu hala sınıfta tutuyoruz)

https://medium.com/media/0cce3e9d21e766cc30b5e62e7c095bf2/href

get_recommendations_include_rating(keywords) metodu ilk versiyonda geliştirilen get_recommendations(keywords) metoduna benzer bir şekilde çalışacak. Fakat bu yeni metod önerileri, yeni geliştirdiğimiz metodları kullanarak, hem CS skorunu hem rating katkısını hesaba katarak yapacak. Adım adım metodumuzun nasıl çalıştığına bakalım:

Keywords parametresi alıyor ve aşağıdaki işlemleri veri setindeki tüm şehirler için uyguluyor
CS skorunu hesaplıyor
Rating katkısını Q=10 olarak hesaplıyor
CS skoru ve rating katkısını kullanarak calculate_final_score methodu ile son skoru hesaplıyor
Son skora göre en yüksek skora sahip 5 şehri JSON’a çevirip dönüyor.

Request kodu

Öncelikle RecommenderEngine’den önerileri alacak bir method yazalım:

def get_recommendations_include_rating(keywords):
    return RecommenderEngine.get_recommendations_include_rating(keywords)

Şimdi 3 kategorimiz içinde, yeni metodu kullanarak öneri alacak olan 3 farklı request yazalım:

# Version 2 requests are below:

top_5_cultural_with_rating = get_recommendations_include_rating(culture_keywords)
city_names_for_cultural_rating = get_top_5_city_names_out_of_json(top_5_cultural_with_rating)
print(city_names_for_cultural_rating)
print("#################")
top_5_summer_with_rating = get_recommendations_include_rating(beach_n_sun_keywords)
city_names_for_summer_rating = get_top_5_city_names_out_of_json(top_5_summer_with_rating)
print(city_names_for_summer_rating)
print("#################")
top_5_party_with_rating = get_recommendations_include_rating(nightlife_keywords)
city_names_for_party_rating = get_top_5_city_names_out_of_json(top_5_party_with_rating)
print(city_names_for_party_rating)
print("#################")

Bu kod önerilen şehirleri ve son skorlarını ekrana yazdıracak, request.py çalıştırıp gelen sonuçları inceleyebilirsiniz.

Bu yazımızda sadece Kültür, Sanat ve Tarih kategorisini, iki farklı bakış açısından inceleyeceğiz. İlk olarak, bir önceki versiyonda yazdığımız sadece CS skoru ile öneri yapan metod ile, yeni yazdığımız hem CS skoru, hem rating katkısını kullanarak son skor hesaplayan metodu karşılaştıracağız.

get_recommendations ve get_recommendations_include_rating metodlarının karşılaştırılması:

Aşağıdaki kodu iki metodu karşılaştırma amacıyla yazdığım için request.py sınıfına dahil etmedim, dilerseniz kopyalayıp çalıştırabilirsiniz:

top_5_cultural_cities = get_recommendations(culture_keywords)
city_names_for_cultural = get_top_5_city_names_out_of_json(top_5_cultural_cities)
print(city_names_for_cultural)
print("#################")

top_5_cultural_with_rating = get_recommendations_include_rating(culture_keywords)
city_names_for_cultural_rating = get_top_5_city_names_out_of_json(top_5_cultural_with_rating)
print(city_names_for_cultural_rating)
print("#################")

İki metodun çıktısı aşağıdaki gibi:

[('Athens', 0.21629522817435007),
 ('St. Petersburg', 0.16666666666666666),
 ('Stockholm', 0.14962640041614492),
 ('Milan', 0.140028008402801),
 ('Rome', 0.12171612389003691)]

#################

[('Athens', 0.22927294186481106),
 ('Stockholm', 0.1556114564327907),
 ('St. Petersburg', 0.15333333333333332),
 ('Milan', 0.15123024907502508),
 ('Rome', 0.13145341380123987)]

Yukarıda, her iki farklı içinde farklı skor ve şehir sıralamaları verilmiştir. Gördüğünüz gibi, yeni geliştirdiğimiz metodda (alttaki) Stockholm ikinci sıraya yükselirken, St. Petersburg üçüncü sıraya geriledi. Gelin bunun nedenini inceleyelim:

Veri setimizde görüldüğü üzere, Stockholm’ün rating’i 7 iken, St. Petersburg’un ratingi 1. Bu yüzden algoritmamız St. Petersburg için son skoru, Cs skoruna göre daha düşük hesaplarken, Stockholm için ise son skoru daha yüksek hesaplıyor. Bu yüzden son skorda Stockholm, St. Petersburg’u geçerek ikinci sıraya yükseliyor. Burada görebiliyoruzki, yazdığımız metod ve formüller, yüksek ratingli içeriklerin skorunu artırırken, düşük ratingli içeriklerin skorunu azaltıyor. Veri setimizdeki diğer şehirlerin rating bilgisini de inceleyerek, genel olarak skorlardaki değişimlerin nedenlerini gözlemleyebilirsiniz.

get_recommendations_include_rating metodunun Q = 10 ve Q = 100 için karşılaştırılması:

Şimdi yeni metodumuzu farklı Q parametreleriyle karşılaştıracağız. Hatırlarsanız, rating katkısı, Q parametresinin değeriyle doğru orantılı olarak değişiyor. Bir önceki karşılaştırmada yazdırdığımız gibi, Q=10 için, Kültür, Sanat ve Tarih kategorisinde son skor hesaplamasının en yüksek skorlu 5 şehri:

[('Athens', 0.22927294186481106),
 ('Stockholm', 0.1556114564327907),
 ('St. Petersburg', 0.15333333333333332),
 ('Milan', 0.15123024907502508),
 ('Rome', 0.13145341380123987)]

Şimdi Q parametresini 100 yaparak sonuçları inceleyeceğiz. recommender_engine.py dosyasına gidip get_recommendations_include_rating metodundaki 10 sayısını 100 olarak güncelleyerek parametre değerini artırabilirsiniz:

rating_contribution = RatingExtractor.get_rating_weight(rating,100)

Şimdi yeni sonuçlarımıza bakalım:

[('Athens', 0.3460723650789601),
 ('Milan', 0.2520504151250418),
 ('Rome', 0.21908902300206645),
 ('Stockholm', 0.2094769605826029),
 ('Venice', 0.17777777777777776)]

Q parametresini 100 yaptığımızda, sonuçlarımızın çok daha farklı olduğunu görebiliyoruz:

St. Petersburg şehri artık ilk 5 de bile değil, 1 ratingi olduğu için Q parametresi de yükselince, son skoru tamamen düşük bir değer aldı.
Stockholm dördüncü sıraya düşerken, Milan ve Roma, ikince ve üçüncü sıralara yükseldi, aşağıda görülebileceği gibi Milan ve Romanın rating’i Stockholm’e göre daha yüksek olduğu için

Roma, Milan ve Stockholm için rating kıyaslaması

Diğer şehir kategorileri içinde farklı Q parametreleriyle sonuçların nasıl değiştiğini incelemenizi öneririm.

Versiyon-2 Sonucu

İkinci versiyonumuzda Cosine benzerliğinin yanı sıra rating bilgisini de kullanarak şehir önerisi yapan yeni bir method geliştirdik. Öneri sistemlerinde rating gibi bilgileri kullanmak oldukça önemli, çünkü genellikle insanlar tarafından beğenilen içerikleri önermek isteriz.

İkinci versiyonun tüm kodlarına buradan ulaşabilirsiniz.

Bir sonraki versiyonda verilen rating sayısınıda hesaba katarak öneri sistemimizi daha iyi bir hale getirmeye çalışacağız.

Versiyon-3 (Rating Eşik Değeri)

Bir içeriğin yüksek rating’e sahip olması, bu rating’in güvenilir olduğu anlamına gelmez. A ve B adında iki farklı içerik olduğunu düşünün. A, 500.000 kişinin verdiği rating sonucunda 4.7 rating’e sahip ve B ise 10 kişinin verdiği rating sonucunda 5 ratingi’ne sahip. Hangi içeriği bir arkadaşınıza önermek isterdiniz? B içeriğinin sahip olduğu 5 ratingi, sadece 10 kişinin rating verdiği göz önüne alındığında ne kadar güvenilir olabilir? rating_count feature’ı sayesinde; ratingler için bir eşik parametresi oluşturacağız ve öneri sistemimiz, rating sayısı bu parametreden düşük olan içeriklere (şehirler) rating katkısı hesaplanırken çok fazla ağırlık yüklemeyecek. Bu sayede az sayıda kişiden rating almış şehirlerin rating bilgisi pek fazla kaale alınmamış olacak.

Rating count feature’ı ile rating ağırlık hesaplanması

Multiplier formülü

Yukarıdaki formül sayesinde hesaplanan M çarpanı, rating katkısı ile çarpılarak, son öneri skoru hesaplanmasında kullanılmak üzere rating ağırlığını elde etmemize yardımcı olacak. T eşik sayısını, c ise bir şehrin aldığı rating sayısını temsil ediyor. Bu formül aşağıdaki gibi çalışacak şekilde tasarlanmıştır:

M değeri 0.0 ve 1.0 arasında değişebilir
T ve c parametreleri eşit olduğu durumda, M her zaman 0.50 ye eşittir.

Bu formülde e sayısının kullanılması için herhangi bir özel durum bulunmamakta, bir başka sayı kullanarakta formülü oluşturabilirdik (o zaman 0,68 sayısı değişmek zorunda kalırdı). Havalı görünmesi için e sayısını kullandım :P

Bu formülün en önemli noktası, T (eşik) ve c (rating sayısı) değerleri birbirine eşit olduğunda 0.50 değerini veriyor olması. Bir şehrin aldığı rating sayısı bizim girdiğimiz eşik değerinden düşükse, M değeri 0.0–0.50 aralığında, eğer eşik değerinden yüksekse, M değeri 0.50–1.0 arasında olacak. Ama rating sayısı ve eşik değeri ne olursa olsun, M değeri asla 1.0 dan yüksek olamayacak.

Şimdi rating_extractor.py dosyasına gidelim ve yeni bir metod yazalım. Burada yalnızca rating katkısını M ile çarpacağız fakat, daha önce geliştirdiğimiz metoduda ilerde olduğu gibi kullanabilmeniz adına bu yeni yazacağımız metodu ayrı olarak yazacağız.

İlk olarak e’yi dosyamıza import edeceğiz:

from math import e

Sonra, RatingExtractor sınıfına aşağıdaki metodu ekleyelim:

@staticmethod
def get_rating_weight_with_quantity(rating, c, T, q=10):
    if rating > 10 or rating < 0:
        return None
    else:
        m = (2*q) / 10 #10 because rating varies between 0 and 10
        b = -q
        val = (m*rating) + b

        M = e**((-T*0.68)/c)

        return val * M

Metod aşağıdaki gibi çalışacak:

rating, c (rating sayısı), T (eşik) ve Q parametrelerini alıyor.
rating ve Q parametrelerini önceki bölümlerde görmüştük.
rating katkısını hesaplıyor
Verilen parametrelere göre M çarpanını hesaplıyor
rating katkısını M ile çarparak rating ağırlığını dönüyor.

RecommenderEngine sınıfında yeni metod geliştirme

Şimdi recommender_engine.py dosyasını açalım ve RecommenderEngine sınıfına yeni bir metod ekleyelim (önceki bölümlerde geliştirdiğimiz metodları hala tutuyoruz). Bu ekleyeceğimiz metod aslında önceki bölümlerde geliştirdiğimiz metodlara oldukça benziyor fakat bu sefer, şehir betimlemesi ve rating ile birlikte, rating sayısı ve T (eşik değeri) parametrelerini kullanacağız.

https://medium.com/media/961233d0c30c17be6c90b2cde739acbf/href

Metodumuz aşağıdaki gibi çalışıyor:

Keywords parametresi alıyor, ve veri setindeki tüm şehirler için aşağıdaki adımları gerçekleştiriyor
CS skorunu hesaplıyor.
Her şehir için betimleme, rating, rating sayısı, eşik T = 1.000.000 (Veri setimizde rating sayısı 100 bin ila 5 milyon arasında değiştiği için 1 milyon değerini seçtim) ve Q=10 değerleriyle rating ağırlığını hesaplıyor.
CS skoru ve rating ağırlığıyla calculate_final_score metodunu (önceki bölümde geliştirilmişti) çağırarak, son skoru hesaplıyor.
En yüksek skora sahip 5 şehri JSON’a çevirip dönüyor.

Request kodu

Sırada request.py dosyasına yeni metodumuzu kullanarak 3 farklı kategori için request göndermek var.

Öncelikle, yeni metodumuzu kullanarak önerileri alacak bir metod yazalım:

def get_recommendations_include_rating_count_threshold(keywords):
    return RecommenderEngine.get_recommendations_include_rating_count_threshold(keywords)

Şimdi 3 kategori için önerileri alacak olan 3 request yapalım:

# Version 3 requests are below:

top_5_cultural_with_rating_count_threshold = get_recommendations_include_rating_count_threshold(culture_keywords)
city_names_for_cultural_rating_count_threshold = get_top_5_city_names_out_of_json(top_5_cultural_with_rating_count_threshold)
print(city_names_for_cultural_rating_count_threshold)
print("#################")

top_5_summer_with_rating_count_threshold = get_recommendations_include_rating_count_threshold(beach_n_sun_keywords)
city_names_for_summer_rating_count_threshold = get_top_5_city_names_out_of_json(top_5_summer_with_rating_count_threshold)
print(city_names_for_summer_rating_count_threshold)
print("#################")

top_5_party_with_rating_count_threshold = get_recommendations_include_rating_count_threshold(nightlife_keywords)
city_names_for_party_rating_count_threshold = get_top_5_city_names_out_of_json(top_5_party_with_rating_count_threshold)
print(city_names_for_party_rating_count_threshold)
print("#################")

Yukardaki kod bloğu, 3 kategori içinde son skorlarına bağlı olarak önerileri yazdıracak. request.py çalıştırarak tüm kategoriler için sonuçları görebilirsiniz. Ama biz bu yazımızda yalnızca Kültür, Sanat ve Tarih kategorisi için sonuçları inceleyeceğiz.

Farklı eşik değerleriyle sonuçların kıyaslaması

Gelin, Kültür Sanat ve Tarih kategorisi için farklı T değerleriyle deneysel requestler yapalım. Threshold değerini RecommenderEngine sınıfındaki get_recommendations_inçlude_rating_count_threshold metodundan değiştirebilirsiniz. Ayrıca eşik etkisini daha iyi görebilmek için Q değerinide 100'e arttıralım (önceki bölümlerde anlatıldığı gibi Q yükseldikçe son skor hesaplamasında rating’in etkisi CS skoruna oranla artıyor).

T = 100.000:

[('Athens', 0.33318171469723395),
 ('Milan', 0.24587898720843948),
 ('Rome', 0.21192640793273687),
 ('Stockholm', 0.18358642633975064),
 ('Venice', 0.17262307588744202)]

T = 1.000.000:

[('Athens', 0.26188415260156817), 
('Milan', 0.2035910531885378), 
('Rome', 0.16707033294390228), 
('Stockholm', 0.14983344608755947), 
('Barcelona', 0.14757848986361075)]

T = 2.500.000:

[('Athens', 0.2257870828894539), 
('Milan', 0.16719580286435054), 
('St. Petersburg', 0.158824470447676), 
('Stockholm', 0.14962644254339), 
('Rome', 0.13613352041126298)]

Yukardaki sonuçlardan görüldüğü üzere, önerilerdeki 5. sıradaki şehir, eşik değeri 100 bin ve 1 milyon olmasına göre değişiyor. Eşik değeri düşük olduğunda beşinci şehir Venedik iken, değer yüksek olduğunda beşinci şehir Barselona. Bunun nedenini görelim:

İki şehrin ratingi de 8 ama, Barselona daha fazla rating sayısına sahip, ayrıca Venedik, Barselonaya kıyasla daha fazla CS skoruna sahip. Bu yüzden eşik değeri 100.000 olduğunda, iki şehirde iyi miktarda rating katkı puanına sahip ve Venediğin CS skoru daha yüksek olduğundan, beşinci sırada Venediği görüyoruz.

Ama eşik değeri 1.000.000 olduğunda, rating katkı skorları aşağıdaki gibi hesaplanıyor (Q=100):

Barcelona: 34
Venice: 26.8

Barselona daha fazla rating katkı puanına sahip ve Q değeride yüksek olduğu için, son skor hesaplandığında elde edilen değer Barselona için daha yüksek olduğundan, beşinci sırada Barselonayı görüyoruz.

Eşik değeri 2.500.000 olduğunda St. Petersburg’u 3. sırada görüyoruz. Ama daha düşük eşik değerlerinde St.Petersburg’u 4. yada 5. sırada bile göremiyorduk. Bunun nedenini araştırmayı sizlere bırakacağım. St. Petersburg şehri için veri setini inceleyip, yazdığımız metodların üzerinden tekrar geçerek, bunun nedenini anlamaya çalışın. Eğer bir sorunuz olursa bana sorabilirsiniz. :)

Bunun haricinde, parametre değerlerini değiştirip farklı değerlerle oynamanızı, veri setindeki değerleri incelemenizi ve aldığınız sonuçları tüm kategoriler içinde inceleyerek, yazdığımız metodların nasıl çalıştığını ve bu metodların öneri sistemlerindeki değerini anlamaya çalışmanızı öneririm.

Versiyon-3 Sonucu

Üçüncü versiyonda, öncelikle şehirlerin CS skorunu ve rating ve rating sayısı featurelarını kullanarak rating ağırlığını hesaplayan, daha sonra CS skoru ve rating ağırlığını kullanarak son skor hesaplayarak şehir öneren bir metod geliştirdik. Öneri sistemlerinde verinin güvenilirliği göz önünde bulunarak öneri yapılması gerektiğinden, yazdığımız metod sayesinde öneri sistemimizin nasıl daha yüksek sayıda (yüksek sayı uygulamadan uygulamaya değişiklik gösterecektir) feedback içeren içerikleri teşvik edebileceğini gördük.

Yazılan tüm kodlara buradan erişebilirsiniz.

Bir sonraki versiyonda, farklı türden feedbackleri işleyerek öneri yapan bir metod geliştireceğiz.

Versiyon-4

Bu versiyonda positive_review ve negative_review featurelarını da kullanarak öneri sistemimizi geliştirmeye devam edeceğiz.

Bu bölümde önceki bölümlere kıyasla, daha çok teori ve deneysel sonuçlar hakkında konuşacağız, eğer sadece kod ile ilgileniyorsanız. Uygulama kısmından başlayabilirsiniz.

Bazen uygulamalarımızda içeriklerimiz için birden farklı tipte feedback’e sahip olduğumuz durumlar olabilir, review ve rating gibi. Tahmin edebileceğiniz gibi, bu feedbackler aynı türde değil, rating feedback’i belirli sayısal bir aralık üzerinde (bizim uygulamamızda 0–10 arası) verilirken review feedback’i genellikle metin olarak verilir. Sahip olduğumuz reviewleri pozitif ve negatif olarak sınıflandırdığımız hayal edelim (belki verilen reviewleri pozitif/negatif olarak sınıflandırma üzerinde farklı bir yazı yazabiliriz), o zaman review feedback’i binary feedback (0 yada 1) olarak ele alınabilir.

Veri setimizde her şehir için positive_review ve negative_review adları altında her şehrin aldığı pozitif ve negatif review sayılarını gösteren featurelar mevcut.

Öneri uygulamalarının başa çıkması gerektiği sorunlardan biride farklı türden feedbackleri birlikte kullanarak daha anlamlı öneriler yapabilme. Bunu yapmak için farklı yöntemler mevcut olsada biz bu yazımızda, özel bir yöntemle alınan reviewleri rating’e dönüştürerek farklı türdeki feedbackleri bir arada kullanacağız.

Reviewleri ratinge çevirme

En basit yöntemle bir review’i ratinge dönüştürmek, pozitif ve negatif reviewler için belirli birer rating değeri belirlemek olurdu. Reviewler bu rating değeriyle dönüştürüldükten sonra, ortalama rating, asıl rating ve dönüştürülen rating hesaba katılarak tekrardan hesaplanabilirdi. Ama bu yaklaşım pek iyi bir yaklaşım olmazdı. Örneğin, ratingler 0 ve 10 olarak seçilmiş olsa, reviewlerin etkisi içerikler üzerinde çok fazla olurdu (özellikle bir içeriğin ortalama ratingi zaten 0 veya 10 a yakınsa). Bu ekstrem etkiyi azaltmak için farklı rating değerleri seçilebilirdi, eğer ratingler 2.5 ve 7.5 olarak seçilse, bu seferde farklı bir problem ortaya çıkacaktı. Ortalama değeri 7.5'in üzerinde olan bir içerik için, otomatik olarak 7.5 ratingine dönüştürülmüş pozitif bir review, pozitif olmasına rağmen, içeriğin ortalama ratinginden düşük olduğu için negatif bir etki oluştururdu. Aynı şekilde hali hazırda rating ortalaması 2.5'in altında olan içerikler içinde, otomatik olarak 2.5 değerine dönüştürülen negatif reviewler, pozitif etki gösterebilirdi. Bu sebeplerden dolayı, daha iyi bir method geliştirmekte fayda var.

Geliştireceğimiz method pozitif ve negatif reviewler için sırasıyla aşağıdaki gibi davranacak:

Her bir pozitif review için, içeriğin ortalama rating’i ile rating skalasında alınabilecek en yüksek değer (bizim uygulamamız için 10) arasındaki mesafe hesaplanıp sonra bu mesafenin yarısı, içeriğin ortalama ratingine eklenerek, pozitif bir review ratinge çevrilmiş olacak.
Her bir negatif review için, içeriğin ortalama rating’i ile rating skalasında alınabilecek en düşük değer (bizim uygulamamız için 0) arasındaki mesafe hesaplanıp, sonra bu mesafenin yarısı içeriğin ortalama ratinginden çıkarılarak, negatif bir review ratinge çevrilmiş olacak.

Pozitif ve negatif reviewleri ratinge çevrilmek için kullanılacak formüller Rp ve Rn olarak sırasıyla aşağıda verilmiştir (r içeriğin ortalama rating değeri):

Pozitif bir review için rating değer dönüşümü

Negatif bir review için rating değer dönüşümü

Örneğin, ortalama ratingi 6 olan bir içeriğe verilen her bir negatif review 3 değeriyle ratinge çevirilerek ratinglerin arasına eklenirken, her bir pozitif review 8 değeriyle çevrilere eklenecek. Sonra ortalama değer skor hesaplamasında kullanılmadan önce, içeriğin hali hazırda sahip olduğu ratingler + yeni dönüştürülen ratingler hesaba katılarak tekrardan hesaplanacak. Review feedback dönüşümü sonuçları farklı rating, rating sayısı ve review sayıları için aşağıdaki tabloda verilmiştir. (tabloyu ingilizce olarak hazırladığım seriden aldığım için sütun başlıkları türkçe değil maalesef)

Ortalama rating ve reviewler için rating değer hesaplaması

Reviewler hesaba katıldıktan sonra tekrar hesaplanan rating değerine bakıldığında, içeriğin ratingi alınabilecek en yüksek ratinge yakın olduğunda ve pozitif ve negatif review sayıları arasında çok fark olmadığında, pozitif review, negatif reviewden fazla olmasına rağmen, metodumuz negatif bir etkiye sahip oluyor. Aynı şekilde içeriğin ratingi alınabilecek en düşük ratinge yakın olduğunda ve negatif ve pozitif review sayıları arasında çok fark olmadığında, negatif review pozitif reviewden fazla olmasına rağmen, metodumuz pozitif bir etkiye sahip oluyor. Örneğin ortalama rating 7,2 olduğunda ve negatif, pozitif review sayıları eşit olduğunda, sonucun 6,65 olduğunu görüyoruz. Bunun nedeni 0 ila 7,2 arasındaki mesafenin 7,2 ila 10 arasındaki mesafeden fazla olması. Bu yüzden hesaplanan değer daha negatif bir etki yaratıyor. Ama genellikle uç kısımlara yakın ratinge sahip içerikler için pozitif ve negatif review sayıları birbirine yakın olmadığı için, bu sorun sistemimizi çokta kötü etkilemeyebilir. Dahası, genellikle içerikler reviewe kıyasla daha çok rating aldığı için, yukardaki testlerdede sayılar buna göre verildi ve haliyle reviewlerin etkisi çok fazla değil. Bu etki farklı bir parametre ekleyerek artırılabilirdi. (Örneğin her bir pozitif review için hesaplanan rating değerine sahip 10 adet rating ekle şeklinde. Şu anda biz her bir review için 1 adet rating ekliyoruz.)

Uygulama

Artık metodumuzun nasıl çalıştığını ve ne tür sonuçlar verdiğini gördüğümüze göre, rating_extactor.py dosyasını açalım ve RatingExtractor sınıfına aşağıda metodu ekleyelim:

    @staticmethod
    def get_rating_with_count_and_reviews(r, rc, pf, bf):
        if r > 10 or r < 0:
            return None
        else:
            positive_diff = (10 - r) / 2
            positive_rating = r + positive_diff

            negative_diff = r / 2
            negative_rating = r - negative_diff

            updated_rating = ((r * rc) + (pf * positive_rating) + (bf * negative_rating)) / (rc + pf + bf)

return RatingExtractor.get_rating_weight_with_quantity(updated_rating,rc,1000000,10)

Metod aşağıdaki şekilde çalışacak:

r (rating), rc (rating sayısı), pf (pozitif review sayısı) ve bf (negatif review sayısı parametrelerini alıyor.
Pozitif reviewler için dönüştürülen rating değerini hesaplıyor.
Negatif reviewler için dönüştürülen rating değerini hesaplıyor.
Güncellenen rating değerini, eski ortalama rating ve yeni dönüştürülen ratingleri hesaba katarak hesaplıyor.
Daha önceki bölümde geliştirilen bir metodu, güncellenen rating değeri, rating sayısı, T = 1.000.000 (eşik) ve Q = 100 (rating önem parametresi) parametreleriyle çağırarak sonucu rating katkısı olarak dönüyor.

RecommenderEngine sınıfına yeni metod ekleme

Şimdi, recommender_engine.py dosyasını açalım ve aşağıdaki metodu RecommenderEngine sınıfına ekleyelim. Bu metod daha önce geliştirdiğimiz metodlara benzemesine rağmen, şimdi pozitif review ve negatif review sayılarını da kullanacağız.

https://medium.com/media/aa32be4d26c0376de71380e2718a7435/href

Metod aşağıdaki gibi çalışacak:

Keywords parametresini alıyor ve aşağıdaki adımları şehirler için uyguluyor
CS skorunu hesaplıyor
rating, rating count, positive review count ve negative review count parametrelerini kullanarak, rating katkı ağırlığı değerini hesaplıyor. Bu sefer T ve Q parametreleri direk RatingExtractor sınıfındaki yeni metodda kullanıldı.
calculate_final_score metodunu (daha önceki bölümlerde geliştirildi) CS skoru ve rating ağırlığı parametreleriyle çağırarak son skoru hesaplıyor.
En yüksek skora sahip 5 şehri JSON a çevirerek dönüyor.

Request

request.py dosyasına, her üç kategori için öneri alacak requestler ekleyeceğiz.

Öncelikle, RecommenderEngine sınıfındaki yeni metodla önerileri alacak bir metod yazalım:

def get_recommendations_include_rating_count_threshold_positive_negative_reviews(keywords):
    return RecommenderEngine.get_recommendations_include_rating_count_threshold_positive_negative_reviews(keywords)

Şimdi yeni metodu kullanarak her 3 kategori için önerileri alalım ve ekrana yazdıralım:

# Version 4 requests are below:

top_5_cultural_with_rating_count_threshold_reviews = get_recommendations_include_rating_count_threshold_positive_negative_reviews(culture_keywords)
city_names_for_cultural_rating_count_threshold_reviews = get_top_5_city_names_out_of_json(top_5_cultural_with_rating_count_threshold_reviews)
print(city_names_for_cultural_rating_count_threshold_reviews)
print("#################")

top_5_summer_with_rating_count_threshold_reviews = get_recommendations_include_rating_count_threshold_positive_negative_reviews(beach_n_sun_keywords)
city_names_for_summer_rating_count_threshold_reviews = get_top_5_city_names_out_of_json(top_5_summer_with_rating_count_threshold_reviews)
print(city_names_for_summer_rating_count_threshold_reviews)
print("#################")

top_5_party_with_rating_count_threshold_reviews = get_recommendations_include_rating_count_threshold_positive_negative_reviews(nightlife_keywords)
city_names_for_party_rating_count_threshold_reviews = get_top_5_city_names_out_of_json(top_5_party_with_rating_count_threshold_reviews)
print(city_names_for_party_rating_count_threshold_reviews)
print("#################")

Yukardaki kod, her 3 kategori içinde sonuçları alacak ve ekrana yazdıracak. request.py dosyasını çalıştırarak aldığınız sonuçları görebilir ve inceleyebilirsiniz. Yazımızın başında metodumuz ve alınabilecek sonuçlar detaylı olarak incelendiği için bu bölümde kod sonucu inceleme kısmını pas geçeceğim. Yalnızca Kültür, Sanat ve Tarih kategorisi için alınan sonuçlar aşağıda verilmiştir:

[('Athens', 0.2622560540924768), 
('Milan', 0.2040068651858985), 
('Rome', 0.16752794267650856), 
('Stockholm', 0.14984473241175314), 
('Barcelona', 0.14831614523091158)]

Ama tüm kategoriler için alınan sonuçları detaylı bir şekilde incelemenizi öneririm. Veri setini inceleyerek 3. versiyondaki sonuçlarla bu versiyonda aldığımız sonuçları kıyaslamanız kesinlikle faydalı olacaktır. Nasıl bir fark görüyorsunuz ve sizce neden? Eğer bir sorunuz olursa bana yorumlarda sorabilirsiniz :)

Sonuç

Bu versiyonda Öneri Sistemlerinde farklı türden feedbackleri birbirine dönüştürerek birlikte kullanmayı gördük. Farklı yöntemlerin nasıl negatif etkileri olabileceğini, ve geliştirdiğimiz sisteminde bazı durumlarda nasıl istenmeyen sonuçlar doğurabildiğini inceledik. Siz olsaydınız nasıl bir formülle bu sorunları atlatmaya çalışırdınız? Bunu düşünmeniz faydalı olacaktır.

Bu versiyonla birlikte, Öneri Sistemi (Recommender System) uygulamamızın sonuna geldik. Bu yazımızda, kullanıcı hakkında hemen hemen hiçbir şey bilmediğimiz durumlarda (cold start problem), kullanıcıların sadece belirli bir kategori seçmesini isteyerek farklı tekniklerle öneri yapan bir sistem geliştirdik. Umarım sonuna kadar takip edip bu yazıdan memnun kalmışsınızdır. Bu yazıda geliştirilen formül ve teknikler benim kendi fikirlerim olduğu için mükemmel değiller, buna yazılarımızdada zaman zaman şahit olduk veya değindik. Bu yüzden eğer sistemi dahada geliştirebileceğimizi düşündüğünüz alanlar varsa, lütfen yorum olarak belirtin :)

Projemizin son halini buradan indirebilirsiniz.

Hoşçakalın.

Ekstra

Veri setimizde olan bazı featureların neden kullanılmadığını merak ediyor olabilirsiniz. Birinci bölümün başında belirttiğim gibi, bu serimizde geliştirilen teknikler benim master projemin bir parçası. Master projemin bir diğer parçasıda Flutter kullanarak cross platform mobil uygulama geliştirmekti. Bazı featureları orada kullanmıştım. Bahsettiğim uygulamanın ekran görüntülerini aşağıda görebilirsiniz:

Mobil uygulamadan resimler

Eğer bu serimizde geliştirdiğimiz sistem için Flutter kullanarak UI geliştirmek ilginizi çektiyse, lütfen beni haberdar edin. Belki başka bir yazıda birlikte uygulama geliştirebiliriz :)

Kendinize iyi bakın!